In this document, we build a linear log odds model of probability of superiority judgments through a process of model expansion, where we will gradually add predictors to our model.
The LLO model follows from related work suggesting that the human perception of probability is encoded on a log odds scale. On this scale, the slope of a linear model represents the shape and severity of the function describing bias in probability perception. The greater the deviation of from a slope of 1 (i.e., ideal performance), the more biased the judgments of probability. Slopes less than one correspond to the kind of bias predicted by excessive attention to the mean. On the same log odds scale, the intercept is a crossover-point which should be proportional to the number of categories of possible outcomes among which probability is divided. In our case, the intercept should be about 0.5 since workers are judging the probability of a team getting more points with a new player than without.
Load and Prepare Data
We load worker responses from our experiment and do some preprocessing.
# read in data
full_df <- read_csv("experiment-anonymous.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## workerId = col_character(),
## condition = col_character(),
## start_means = col_logical(),
## gender = col_character(),
## age = col_character(),
## education = col_character(),
## chart_use = col_character(),
## strategy_with_means = col_character(),
## strategy_without_means = col_character(),
## outcome = col_logical(),
## trial = col_character(),
## trialIdx = col_character()
## )
## See spec(...) for full column specifications.
# preprocessing
responses_df <- full_df %>%
rename( # rename to convert away from camel case
worker_id = workerId,
ground_truth = groundTruth,
sd_diff = sdDiff,
p_award_with = pAwardWith,
p_award_without = pAwardWithout,
account_value = accountValue,
p_superiority = pSup,
start_time = startTime,
resp_time = respTime,
trial_dur = trialDur,
trial_idx = trialIdx
) %>%
# remove practice and mock trials from responses dataframe, leave in full version
filter(trial_idx != "practice", trial_idx != "mock") %>%
# drop rows where p_superiority == NA for some reason
drop_na(p_superiority) %>%
# mutate rows where intervene == -1 for some reason
mutate(
intervene = if_else(intervene == -1,
# repair
if_else((payoff == (award_value - 1) | payoff == -1),
1, # payed for intervention
0), # didn't pay for intervention
# don't repair
as.numeric(intervene) # hack to avoid type error
)
) %>%
# set up factors for modeling
mutate(
# add a variable to note whether the chart they viewed showed means
means = as.factor((start_means & as.numeric(trial) <= (n_trials / 2)) | (!start_means & as.numeric(trial) > (n_trials / 2))),
start_means = as.factor(start_means),
sd_diff = as.factor(sd_diff),
trial_number = as.numeric(trial)
)
head(responses_df)
## # A tibble: 6 x 38
## worker_id batch n_trials n_data_conds condition baseline es_threshold
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 7819bfb6 17 34 18 intervals 0.5 0.9
## 2 7819bfb6 17 34 18 intervals 0.5 0.9
## 3 7819bfb6 17 34 18 intervals 0.5 0.9
## 4 7819bfb6 17 34 18 intervals 0.5 0.9
## 5 7819bfb6 17 34 18 intervals 0.5 0.9
## 6 7819bfb6 17 34 18 intervals 0.5 0.9
## # … with 31 more variables: start_means <fct>, award_value <dbl>,
## # starting_value <dbl>, exchange <dbl>, cutoff <dbl>, max_bonus <dbl>,
## # total_bonus <dbl>, duration <dbl>, numeracy <dbl>, gender <chr>, age <chr>,
## # education <chr>, chart_use <chr>, strategy_with_means <chr>,
## # strategy_without_means <chr>, account_value <dbl>, ground_truth <dbl>,
## # intervene <dbl>, outcome <lgl>, p_award_with <dbl>, p_award_without <dbl>,
## # p_superiority <dbl>, payoff <dbl>, resp_time <dbl>, sd_diff <fct>,
## # start_time <dbl>, trial <chr>, trial_dur <dbl>, trial_idx <chr>,
## # means <fct>, trial_number <dbl>
We need the data in a format where it is prepared for modeling. We censor responses to the range 0.5% to 99.5% where responses at these bounds reflect an intended response at the bound or higher. By rounding responses to the nearest 0.5%, we assume that the response scale has a resolution of 1% in practice. We need to do this to avoid values of positive or negative infinity when we transform responses to a log odds scale. We convert both probability of superiority judgments and the ground truth to a logit scale.
# create data frame for model
model_df <- responses_df %>%
mutate(
# recode responses greater than 99.5% and less than 0.5% to avoid values of +/- Inf on a logit scale
p_superiority = if_else(p_superiority > 99.5,
99.5,
if_else(p_superiority < 0.5,
0.5,
as.numeric(p_superiority))),
# apply logit function to p_sup judgments and ground truth
lo_p_sup = qlogis(p_superiority / 100),
lo_ground_truth = qlogis(ground_truth),
# # scale and center lo_ground_truth
# clo_ground_truth = (lo_ground_truth - mean(lo_ground_truth)) / (max(lo_ground_truth) - min(lo_ground_truth)),
# scale and center trial order
trial = (trial_number - as.numeric(n_trials) / 2) / as.numeric(n_trials)
)
Now, lets apply our exclusion criteria, cutting our sample down to only the subset of participants who passed both attention checks.
# determine exclusions
exclude_df <- model_df %>%
# attention check trials where ground truth = c(0.5, 0.999)
mutate(failed_check = (ground_truth == 0.5 & intervene != 0) | (ground_truth == 0.999 & intervene != 1)) %>%
group_by(worker_id) %>%
summarise(
failed_attention_checks = sum(failed_check),
unique_p_sup = length(unique(p_superiority)),
# excluded if they failed either attention check or used fewer than three levels of the response scale
exclude = failed_attention_checks > 0 | unique_p_sup < 3
) %>%
dplyr::select(worker_id, exclude)
# apply exclusion criteria and remove attention check trials from modeling data set
model_df <- model_df %>%
left_join(exclude_df, by = "worker_id") %>%
filter(exclude == FALSE) %>%
filter(ground_truth > 0.5 & ground_truth < 0.999)
# how many remaining workers per condition?
model_df %>%
group_by(condition, start_means) %>% # between subject manipulations
summarise(
n_workers = length(unique(worker_id))
)
## # A tibble: 8 x 3
## # Groups: condition [4]
## condition start_means n_workers
## <chr> <fct> <int>
## 1 densities FALSE 79
## 2 densities TRUE 78
## 3 HOPs FALSE 79
## 4 HOPs TRUE 76
## 5 intervals FALSE 80
## 6 intervals TRUE 80
## 7 QDPs FALSE 77
## 8 QDPs TRUE 77
In addition to excluding participants who failed at least one of the two attention checks in the experiment, which is our preregistered exclusion criterion, we also exclude a handful of workers whose data lead to model fit issues. These are workers who responded with only one or two levels of the probability of superiority scale. We could make the case that these workers might not have been trying very hard when responding, but the reason for excluding them is much more practical: It is not possible for the modeling process we are using to estimate random effects on response variability for these participants (i.e., you cannot calculate the variance of a set with only one or two distinct values). These random effects on variance are very important, because our data almost certaintly violate a homogeneity of variance assumption.
Because of these exclusions, we are a few participants short of our target samples size of 80. We should still have more than enough data to support statistical inferences. Here we drop a handful of additional participants to maintain counterbalancing of block order. Since we know that there were some participants with dropped responses, let’s prioritize leaving out workers with the greatest number of dropped trials in each counterbalancing condition.
model_df %>%
group_by(condition, start_means, worker_id) %>%
summarise(
n_trials = n(),
dropped_trials = 32 - n_trials
) %>%
filter(dropped_trials > 0)
## # A tibble: 5 x 5
## # Groups: condition, start_means [3]
## condition start_means worker_id n_trials dropped_trials
## <chr> <fct> <chr> <int> <dbl>
## 1 densities TRUE e4b46997 24 8
## 2 HOPs FALSE c488db75 5 27
## 3 HOPs FALSE ce016e09 25 7
## 4 HOPs FALSE f430e2e8 28 4
## 5 intervals FALSE ff8a2a69 28 4
Based on a comparison of the two tables above, we’ll drop workers c488db75, ce016e09, and f430e2e8 to ensure our ability to fit our model.
# remove workers with missing data, plus one where condition = densities, start_means = FALSE
model_df <- model_df %>%
filter(!worker_id %in% c("c488db75", "ce016e09", "f430e2e8")) # also exclude "c337674a" to counterbalance, but would take a long time to rerun
model_df %>%
group_by(condition, start_means) %>% # between subject manipulations
summarise(
n_workers = length(unique(worker_id))
)
## # A tibble: 8 x 3
## # Groups: condition [4]
## condition start_means n_workers
## <chr> <fct> <int>
## 1 densities FALSE 79
## 2 densities TRUE 78
## 3 HOPs FALSE 76
## 4 HOPs TRUE 76
## 5 intervals FALSE 80
## 6 intervals TRUE 80
## 7 QDPs FALSE 77
## 8 QDPs TRUE 77
Now we have our dataset ready for modeling.
Distribution of Probability of Superiority Judgments
We start as simply as possible by just modeling the distribution of probability of superiority judgements on the log odds scale.
Before we fit the model to our data, let’s check that our priors seem reasonable. We’ll use a weakly informative prior for the intercept parameter since we want the population-level centered intercept to be flexible. We set the expected value of the prior on the intercept equal to the mean value of the ground truth that we sampled (in log odds units).
# get mean value of ground truth sampled in log odds units
model_df %>% select(lo_ground_truth) %>% summarize(mean = mean(lo_ground_truth))
## # A tibble: 1 x 1
## mean
## <dbl>
## 1 1.30
# get_prior(data = model_df, family = "gaussian", formula = lo_p_sup ~ 1)
# starting as simple as possible: learn the distribution of lo_p_sup
prior.lo_p_sup <- brm(data = model_df, family = "gaussian",
lo_p_sup ~ 1,
prior = c(prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
Let’s look at our prior predictive distribution. For this intercept model, it should be skewwed left because we have located our prior near 74% probability of superiority. We should see a peak near the upper bound of the probability scale.
# prior predictive check
model_df %>%
select() %>%
add_predicted_draws(prior.lo_p_sup, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
prior_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = prior_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Prior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

Now, let’s fit the model to data. This is just trying to estimate the mean response regardless of the ground truth.
# starting as simple as possible: learn the distribution of lo_p_sup
m.lo_p_sup <- brm(data = model_df, family = "gaussian",
lo_p_sup ~ 1,
prior = c(prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 1), class = sigma)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/lo_mdl")
Check diagnostics:
# trace plots
plot(m.lo_p_sup)

# pairs plot
pairs(m.lo_p_sup)

# model summary
print(m.lo_p_sup)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: lo_p_sup ~ 1
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.57 0.01 0.55 0.59 1.00 2957 3127
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.29 0.01 1.28 1.30 1.00 5269 3285
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select() %>%
add_predicted_draws(m.lo_p_sup, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority",
post_p_sup = NULL) +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Our model is not sensitive to the ground truth, so we expect to see a mismatch here.
Linear Log Odds Model of Probability of Superiority
Now well add in a slope parameter to make our model sensitive to the ground truth. This is the simplest version of our linear log odds (LLO) model.
Before we fit the model to our data, let’s check that our priors seem reasonable. Since we are now including a slope parameter for the ground truth in our model, we can dial down the width of our prior for sigma (i.e., residual variance) to avoid over-dispersion of predicted responses.
# get_prior(data = model_df, family = "gaussian", formula = lo_p_sup ~ lo_ground_truth)
# simple LLO model
prior.llo <- brm(data = model_df, family = "gaussian",
lo_p_sup ~ lo_ground_truth,
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.5), class = sigma)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
Let’s look at our prior predictive distribution. For this linear model, we should see density spread slightly more evenly across probability values.
# prior predictive check
model_df %>%
select(lo_ground_truth) %>%
add_predicted_draws(prior.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
prior_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = prior_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Prior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

Now let’s fit the model to data.
# simple LLO model
m.llo <- brm(data = model_df, family = "gaussian",
lo_p_sup ~ lo_ground_truth,
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.5), class = sigma)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
file = "model-fits/llo_mdl")
Check diagnostics:
# trace plots
plot(m.llo)

# pairs plot
pairs(m.llo)

Our slope and intercept parameters seem pretty highly correlated. Maybe adding hierarchy to our model will remedy this.
# model summary
print(m.llo)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: lo_p_sup ~ lo_ground_truth
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.13 0.02 -0.16 -0.10 1.00 3695 3646
## lo_ground_truth 0.54 0.01 0.52 0.56 1.00 3685 3739
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.20 0.01 1.18 1.21 1.00 6959 4010
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth) %>%
add_predicted_draws(m.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Our model is now sensitive to the ground truth, but it is still having trouble fitting the data. It may be that the model is not capturing the individual variability in response patterns. Next we’ll add hierarchy to our model.
Add Hierarchy for Slope, Intercepts, and Sigma
The models we’ve created thus far fail to account for much of the variability in the data. Here, we attempt to parse some heterogeniety in responses by modeling a random effect of worker on slopes, intercepts, and residual variance. This introduces a hierarchical component to our model in order to account for individual differences in the best fitting linear model for each worker’s data.
Before we fit the model to our data, let’s check that our priors seem reasonable. We are adding hyperpriors for the standard deviation of slopes, intercepts, and residual variation (i.e., sigma) per worker, as well as the correlation between them. We’ll set moderately wide priors on these worker-level slope and intercept effects. We want some regularization, but we don’t want to overregularize potentially large individual variability, which is sort of a tough balance. We’ll also narrow the priors on sigma parameters since we are now attributing variability to more sources and we want to avoid overdispersion. We’ll set a prior on the correlation between slopes and intercepts per worker that avoids large absolute correlations.
# get_prior(data = model_df, family = "gaussian", formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth, sigma ~ (1|sharecor|worker_id)))
# hierarchical LLO model
prior.wrkr.llo <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
sample_prior = "only",
iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
Let’s look at our prior predictive distribution. Because this model contains so many more sources of variation, the prior predictive distribution may look a little overdispersed (i.e., lots of mass at the boundaries of the response scale). However, it’s probably best to err on the side of not making our priors on individual parameters too narrow.
# prior predictive check
model_df %>%
select(lo_ground_truth, worker_id) %>%
add_predicted_draws(prior.wrkr.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
prior_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = prior_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Prior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

Now, let’s fit the model to our data.
# hierarchical LLO model
m.wrkr.llo <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-wrkr")
Check diagnostics:
# trace plots
plot(m.wrkr.llo)


# pairs plot (fixed effects)
pairs(m.wrkr.llo, exact_match = TRUE, pars = c("b_Intercept", "b_lo_ground_truth", "b_sigma_Intercept"))

# pairs plot (random effects)
pairs(m.wrkr.llo, pars = c("sd_worker_id__", "cor_worker_id__"))

# model summary
print(m.wrkr.llo)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth
## sigma ~ (1 | sharecor | worker_id)
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 0.42 0.03 0.37 0.47 1.00
## sd(lo_ground_truth) 0.45 0.01 0.43 0.48 1.00
## sd(sigma_Intercept) 0.77 0.02 0.73 0.81 1.00
## cor(Intercept,lo_ground_truth) -0.24 0.05 -0.33 -0.15 1.00
## cor(Intercept,sigma_Intercept) -0.47 0.05 -0.56 -0.38 1.00
## cor(lo_ground_truth,sigma_Intercept) 0.58 0.03 0.52 0.64 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 542 906
## sd(lo_ground_truth) 433 651
## sd(sigma_Intercept) 1398 2564
## cor(Intercept,lo_ground_truth) 335 894
## cor(Intercept,sigma_Intercept) 442 919
## cor(lo_ground_truth,sigma_Intercept) 1487 2396
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.15 0.02 -0.19 -0.11 1.00 542 1018
## sigma_Intercept -0.73 0.03 -0.79 -0.66 1.00 618 1239
## lo_ground_truth 0.55 0.02 0.51 0.58 1.01 208 565
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth, worker_id) %>%
add_predicted_draws(m.wrkr.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.
# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.llo)
# run LOO to get weights
loo <- loo(m.wrkr.llo, save_psis = TRUE, cores = 2)
## Warning: Found 183 observations with a pareto_k > 0.7 in model 'm.wrkr.llo'.
## With this many problematic observations, it may be more appropriate to use
## 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than
## LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s look at posterior predictions per worker to get a more detailed sense of fit quality. When we make this kind of plot for model checks at the level of individual workers, we’ll look at a subset of workers to keep the number of charts generated to a reasonable number.
# two workers from each counterbalancing condition
model_check_set <- model_df %>%
group_by(start_means, condition, worker_id) %>%
summarise() %>%
top_n(2)
## Selecting by worker_id
model_check_set <- model_check_set$worker_id
model_check_df <- model_df %>%
filter(worker_id %in% model_check_set)
model_check_df %>%
group_by(worker_id) %>%
summarise()
## # A tibble: 16 x 1
## worker_id
## <chr>
## 1 f27ed3b6
## 2 f4f534e0
## 3 f5d48035
## 4 f796f54d
## 5 f7f69f44
## 6 f83e2827
## 7 fa0f4b94
## 8 fa22b8bb
## 9 fba3405d
## 10 fccb21d5
## 11 fd15ec30
## 12 fd3bea1b
## 13 fdb8555e
## 14 fe8936cd
## 15 fee45dce
## 16 ff8a2a69
model_check_df %>%
# get posterior predictive distribution
group_by(lo_ground_truth, worker_id) %>%
add_predicted_draws(m.wrkr.llo, n = 500) %>%
# plot
ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

What does this look like in probability units?
model_check_df %>%
# get posterior predictive distribution
group_by(lo_ground_truth, worker_id) %>%
add_predicted_draws(m.wrkr.llo, n = 500) %>%
# plot
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

One thing we’re trying to gage here is whether our model has predictive validity at the level of each worker. To examine this more closely we’ll look at QQ plots for residuals at the worker level.
model_check_df %>%
# get posterior draws and transform
add_predicted_draws(m.wrkr.llo, n = 500) %>%
group_by(lo_ground_truth, worker_id) %>%
summarise(
p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
z_residual = qnorm(p_residual) # what are the z-scores of these cumulative probabilities?
) %>%
# plot
ggplot(aes(sample = z_residual)) +
geom_qq() +
geom_abline() +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

These don’t look great. We can see that there is some clustering of responses, probably reflecting a preference for round numbers on the response scale.
pp_check(m.wrkr.llo)
## Using 10 posterior samples for ppc type 'dens_overlay' by default.

As long as the location and scale of the predictions look reasonably in line with the empirical data (which they do), we don’t really care too much if the model doesn’t predict every small anomally. This plot showing predictive densities alongside the observed data is resassuring insofar as we are doing a decent job of modeling the things we care about.
Let’s see if our predictive validity improves at the worker level when we add our experimental manipulations as predictors.
Add Predictors to Answer Research Questions
In order to answer our research questions, we need to account for the interaction of the ground truth with whether means are present vs absent, whether visualized uncertainty is high vs low, and what uncertainty visualization condition a user was assigned to. We’ll add predictors for each of these factors to our hierarchical model in turn.
Presence/Absence of the Mean
Our primary research question is how the presence of the mean impacts the slopes of linear models in log odds space. To test this, we’ll add an interaction between the presence of the mean and the ground truth.
We use the same priors as we did for the previous model. Now, let’s fit the model to our data.
# hierarchical LLO model with fixed effects on slope and residual variance conditioned on the presence/absence of the mean
m.wrkr.means.llo <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-wrkr_means")
Check diagnostics:
# trace plots
plot(m.wrkr.means.llo)



# pairs plot (fixed effects)
pairs(m.wrkr.means.llo, exact_match = TRUE, pars = c("b_Intercept",
"b_lo_ground_truth",
"b_meansTRUE",
"b_lo_ground_truth:meansTRUE",
"b_sigma_Intercept"))

# pairs plot (random effects)
pairs(m.wrkr.means.llo, exact_match = TRUE, pars = c("sd_worker_id__Intercept",
"sd_worker_id__lo_ground_truth",
"sd_worker_id__sigma_Intercept"))

# pairs plot (covariance matrix)
pairs(m.wrkr.means.llo, pars = c("cor_worker_id__"))

# model summary
print(m.wrkr.means.llo)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means
## sigma ~ (1 | sharecor | worker_id)
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 0.42 0.03 0.37 0.47 1.00
## sd(lo_ground_truth) 0.46 0.01 0.43 0.48 1.00
## sd(sigma_Intercept) 0.77 0.02 0.73 0.81 1.00
## cor(Intercept,lo_ground_truth) -0.25 0.05 -0.33 -0.15 1.00
## cor(Intercept,sigma_Intercept) -0.47 0.04 -0.56 -0.38 1.00
## cor(lo_ground_truth,sigma_Intercept) 0.58 0.03 0.52 0.64 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 763 1288
## sd(lo_ground_truth) 579 1862
## sd(sigma_Intercept) 1377 2391
## cor(Intercept,lo_ground_truth) 732 1837
## cor(Intercept,sigma_Intercept) 799 1594
## cor(lo_ground_truth,sigma_Intercept) 1187 2528
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -0.15 0.02 -0.19 -0.11 1.00 712
## sigma_Intercept -0.73 0.03 -0.79 -0.67 1.00 845
## lo_ground_truth 0.54 0.02 0.51 0.58 1.01 392
## meansTRUE 0.01 0.01 -0.01 0.02 1.00 9002
## lo_ground_truth:meansTRUE 0.00 0.00 -0.01 0.01 1.00 8119
## Tail_ESS
## Intercept 2011
## sigma_Intercept 1171
## lo_ground_truth 1283
## meansTRUE 3749
## lo_ground_truth:meansTRUE 3434
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth, worker_id, means) %>%
add_predicted_draws(m.wrkr.means.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.
# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.means.llo)
# run LOO to get weights
loo <- loo(m.wrkr.means.llo, save_psis = TRUE, cores = 2)
## Warning: Found 176 observations with a pareto_k > 0.7 in model
## 'm.wrkr.means.llo'. With this many problematic observations, it may be more
## appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-
## validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.
model_check_df %>%
group_by(lo_ground_truth, worker_id, means) %>%
add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

What does this look like in probability units?
model_check_df %>%
group_by(lo_ground_truth, worker_id, means) %>%
add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.
model_check_df %>%
add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
group_by(lo_ground_truth, worker_id) %>%
summarise(
p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
z_residual = qnorm(p_residual) # what are the z-scores of these cumulative probabilities?
) %>%
ggplot(aes(sample = z_residual)) +
geom_qq() +
geom_abline() +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

These still look pretty terrible.
With this model we can take a first stab at addressing our research question about the presence of extrinsic means. What does the posterior for the slope of the LLO model look like when means are present vs absent, ignoring other manipulations for now? Since we are building a complex model, we’ll forego calculating maringal effects by manually combining parameters. Instead we’ll use add_fitted_draws and compare_levels from tidybayes to get our slopes, and then we’ll take their weighted average grouping by the parameters for which we want marginal effects.
model_df %>%
group_by(means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.wrkr.means.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out visualization condition by taking a weighted average
ggplot(aes(x = slope, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes for mean present/absent") +
theme(panel.grid = element_blank())

Recall that a slope of 1 represents no bias. This chart suggests that people are biased with or without adding means. We should not be surprised to see little to no effect in this model. The mean difference is a good heuristic for probability of superiority when variance of visualized estimates is high, but it is not a good heuristic when variance is low. Thus, we should expect to see the effect we are looking for as an interaction between the presence of the mean and the level of uncertainty.
Level of Uncertainty Shown
Another factor that we manipulate is the level of uncertainty presented to chart users. We expect level of uncertainty (sd_diff) to determine the impact of extrinsic means on performance. To test this, we’ll add an interaction between sd_diff, means, and the ground truth.
We use the same priors as we did for the previous model. Now, let’s fit the model to our data.
# hierarchical LLO model
m.wrkr.means.sd.llo <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means*sd_diff,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
# prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 3000, warmup = 500, chains = 2, cores = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-wrkr_means_sd")
Check diagnostics:
# trace plots
plot(m.wrkr.means.sd.llo)



# pairs plot (LLO params)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("b_Intercept",
"b_lo_ground_truth",
"b_meansTRUE",
"b_sd_diff15",
"b_lo_ground_truth:meansTRUE",
"b_lo_ground_truth:sd_diff15",
"b_meansTRUE:sd_diff15",
"b_lo_ground_truth:meansTRUE:sd_diff15"))

# pairs plot (random effects on lo_p_sup)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("sd_worker_id__Intercept",
"sd_worker_id__lo_ground_truth"))

# pairs plot (sigma params)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("b_sigma_Intercept",
"sd_worker_id__sigma_Intercept"))

pairs(m.wrkr.means.sd.llo, pars = c("cor_worker_id__"))

# model summary
print(m.wrkr.means.sd.llo)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means * sd_diff
## sigma ~ (1 | sharecor | worker_id)
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
## total post-warmup samples = 5000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 0.45 0.02 0.40 0.50 1.01
## sd(lo_ground_truth) 0.45 0.01 0.42 0.48 1.00
## sd(sigma_Intercept) 0.86 0.02 0.81 0.90 1.00
## cor(Intercept,lo_ground_truth) -0.24 0.05 -0.32 -0.14 1.01
## cor(Intercept,sigma_Intercept) -0.41 0.04 -0.49 -0.32 1.01
## cor(lo_ground_truth,sigma_Intercept) 0.58 0.03 0.52 0.63 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 669 1433
## sd(lo_ground_truth) 348 787
## sd(sigma_Intercept) 831 2209
## cor(Intercept,lo_ground_truth) 323 807
## cor(Intercept,sigma_Intercept) 381 796
## cor(lo_ground_truth,sigma_Intercept) 1112 2331
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept -0.17 0.02 -0.20 -0.13 1.00
## sigma_Intercept -0.82 0.03 -0.89 -0.75 1.01
## lo_ground_truth 0.48 0.02 0.45 0.52 1.01
## meansTRUE -0.00 0.01 -0.02 0.01 1.00
## sd_diff15 0.03 0.01 0.02 0.05 1.00
## lo_ground_truth:meansTRUE -0.00 0.01 -0.01 0.00 1.00
## lo_ground_truth:sd_diff15 0.11 0.01 0.10 0.12 1.00
## meansTRUE:sd_diff15 0.02 0.01 -0.00 0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.03 0.01 0.01 0.04 1.00
## Bulk_ESS Tail_ESS
## Intercept 320 857
## sigma_Intercept 385 1104
## lo_ground_truth 189 370
## meansTRUE 6553 4641
## sd_diff15 5668 4423
## lo_ground_truth:meansTRUE 6581 4439
## lo_ground_truth:sd_diff15 5686 4288
## meansTRUE:sd_diff15 5234 4241
## lo_ground_truth:meansTRUE:sd_diff15 5604 4249
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth, worker_id, means, sd_diff) %>%
add_predicted_draws(m.wrkr.means.sd.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.
# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.means.sd.llo)
# run LOO to get weights
loo <- loo(m.wrkr.means.sd.llo, save_psis = TRUE, cores = 2)
## Warning: Found 208 observations with a pareto_k > 0.7 in model
## 'm.wrkr.means.sd.llo'. With this many problematic observations, it may be more
## appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-
## validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff) %>%
add_predicted_draws(m.wrkr.means.sd.llo, n = 500) %>%
ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

What does this look like in probability units?
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff) %>%
add_predicted_draws(m.wrkr.means.sd.llo, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.
model_check_df %>%
add_predicted_draws(m.wrkr.llo, n = 500) %>%
group_by(lo_ground_truth, worker_id) %>%
summarise(
p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
z_residual = qnorm(p_residual) # what are the z-scores of these cumulative probabilities?
) %>%
ggplot(aes(sample = z_residual)) +
geom_qq() +
geom_abline() +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

These still look pretty terrible.
What does the posterior for the slope of the LLO model look like when means are present vs absent at different levels of uncertainty, ignoring other manipulations?
model_df %>%
group_by(means, sd_diff) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.wrkr.means.sd.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out visualization condition by taking a weighted average
ggplot(aes(x = slope, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes for mean present/absent") +
theme(panel.grid = element_blank()) +
facet_grid(. ~ sd_diff)

Recall that a slope of 1 represents no bias. Overall, people seem less biased at baseline when uncertainty is higher. With regard to the interaction, we see about what we expect. Adding means makes responses less biased when uncertainty is high. However, we also expected to see the opposite as well, that adding means would make people more biased when uncertainty is low. Maybe this will turn out only to be the case for some uncertainty visualization formats rather than across the board.
Visualization Condition
The other thing we really want to know about is the impact of visualization condition on the slopes of linear models in log odds space. Do some visualizations lead to more extreme patterns of bias than others? To test this, we’ll add an interaction between visualization condition and the ground truth. Now we have all our predictors of interest in one model (i.e., this will be the minimal model required to answer our research questions).
We use the same priors as we did for the previous model. Now, let’s fit the model to our data.
# minimal LLO model
m.m.llo <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means*sd_diff*condition,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
# prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-minimal")
Check diagnostics:
# trace plots
plot(m.m.llo)








# pairs plot (intercepts)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_Intercept",
"b_lo_ground_truth",
"b_meansTRUE",
"b_sd_diff15",
"b_conditionintervals",
"b_meansTRUE:sd_diff15",
"b_meansTRUE:conditionintervals",
"b_sd_diff15:conditionintervals",
"b_meansTRUE:sd_diff15:conditionintervals"))

# pairs plot (LLO slopes)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_lo_ground_truth:meansTRUE",
"b_lo_ground_truth:sd_diff15",
"b_lo_ground_truth:conditionintervals",
"b_lo_ground_truth:meansTRUE:sd_diff15",
"b_lo_ground_truth:meansTRUE:conditionintervals",
"b_lo_ground_truth:sd_diff15:conditionintervals",
"b_lo_ground_truth:meansTRUE:sd_diff15:conditionintervals"))

# pairs plot (random effects)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_sigma_Intercept",
"sd_worker_id__Intercept",
"sd_worker_id__lo_ground_truth",
"sd_worker_id__sigma_Intercept"))

pairs(m.m.llo, pars = c("cor_worker_id__"))

# model summary
print(m.m.llo)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means * sd_diff * condition
## sigma ~ (1 | sharecor | worker_id)
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 0.45 0.02 0.41 0.50 1.00
## sd(lo_ground_truth) 0.45 0.01 0.42 0.48 1.00
## sd(sigma_Intercept) 0.86 0.02 0.81 0.90 1.00
## cor(Intercept,lo_ground_truth) -0.26 0.05 -0.34 -0.17 1.00
## cor(Intercept,sigma_Intercept) -0.42 0.04 -0.50 -0.33 1.00
## cor(lo_ground_truth,sigma_Intercept) 0.60 0.03 0.55 0.66 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 3094 6130
## sd(lo_ground_truth) 1764 3623
## sd(sigma_Intercept) 4127 6551
## cor(Intercept,lo_ground_truth) 1764 3842
## cor(Intercept,sigma_Intercept) 2206 4631
## cor(lo_ground_truth,sigma_Intercept) 4429 6923
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept -0.20 0.04
## sigma_Intercept -0.83 0.04
## lo_ground_truth 0.50 0.03
## meansTRUE -0.02 0.02
## sd_diff15 0.04 0.02
## conditionHOPs 0.07 0.06
## conditionintervals -0.07 0.05
## conditionQDPs 0.12 0.05
## lo_ground_truth:meansTRUE 0.01 0.01
## lo_ground_truth:sd_diff15 0.10 0.01
## meansTRUE:sd_diff15 0.04 0.02
## lo_ground_truth:conditionHOPs -0.12 0.05
## lo_ground_truth:conditionintervals -0.04 0.04
## lo_ground_truth:conditionQDPs 0.07 0.04
## meansTRUE:conditionHOPs 0.04 0.03
## meansTRUE:conditionintervals 0.02 0.02
## meansTRUE:conditionQDPs -0.00 0.02
## sd_diff15:conditionHOPs 0.03 0.03
## sd_diff15:conditionintervals -0.00 0.02
## sd_diff15:conditionQDPs -0.04 0.02
## lo_ground_truth:meansTRUE:sd_diff15 0.03 0.01
## lo_ground_truth:meansTRUE:conditionHOPs -0.02 0.02
## lo_ground_truth:meansTRUE:conditionintervals -0.02 0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.00 0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.04 0.02
## lo_ground_truth:sd_diff15:conditionintervals -0.00 0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.04 0.01
## meansTRUE:sd_diff15:conditionHOPs -0.01 0.04
## meansTRUE:sd_diff15:conditionintervals -0.03 0.03
## meansTRUE:sd_diff15:conditionQDPs -0.02 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.01 0.02
## l-95% CI u-95% CI Rhat
## Intercept -0.27 -0.12 1.00
## sigma_Intercept -0.90 -0.76 1.00
## lo_ground_truth 0.44 0.57 1.00
## meansTRUE -0.05 0.01 1.00
## sd_diff15 0.01 0.07 1.00
## conditionHOPs -0.04 0.18 1.00
## conditionintervals -0.18 0.03 1.00
## conditionQDPs 0.02 0.23 1.00
## lo_ground_truth:meansTRUE -0.01 0.02 1.00
## lo_ground_truth:sd_diff15 0.08 0.12 1.00
## meansTRUE:sd_diff15 -0.00 0.08 1.00
## lo_ground_truth:conditionHOPs -0.21 -0.03 1.00
## lo_ground_truth:conditionintervals -0.13 0.04 1.00
## lo_ground_truth:conditionQDPs -0.01 0.15 1.00
## meansTRUE:conditionHOPs -0.01 0.10 1.00
## meansTRUE:conditionintervals -0.02 0.06 1.00
## meansTRUE:conditionQDPs -0.04 0.04 1.00
## sd_diff15:conditionHOPs -0.02 0.08 1.00
## sd_diff15:conditionintervals -0.04 0.04 1.00
## sd_diff15:conditionQDPs -0.09 0.00 1.00
## lo_ground_truth:meansTRUE:sd_diff15 -0.00 0.05 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.05 0.02 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.05 0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.03 0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.01 0.07 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.03 0.03 1.00
## lo_ground_truth:sd_diff15:conditionQDPs 0.01 0.07 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.09 0.06 1.00
## meansTRUE:sd_diff15:conditionintervals -0.09 0.02 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.08 0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.07 0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.02 0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.04 0.03 1.00
## Bulk_ESS Tail_ESS
## Intercept 1793 3810
## sigma_Intercept 1942 3613
## lo_ground_truth 966 2186
## meansTRUE 5644 7812
## sd_diff15 5363 7543
## conditionHOPs 2261 3794
## conditionintervals 1896 3554
## conditionQDPs 1953 3552
## lo_ground_truth:meansTRUE 5462 7786
## lo_ground_truth:sd_diff15 5334 7669
## meansTRUE:sd_diff15 5078 7243
## lo_ground_truth:conditionHOPs 1264 2569
## lo_ground_truth:conditionintervals 1023 2577
## lo_ground_truth:conditionQDPs 1057 2069
## meansTRUE:conditionHOPs 6479 8398
## meansTRUE:conditionintervals 6274 8161
## meansTRUE:conditionQDPs 5962 8272
## sd_diff15:conditionHOPs 6523 8395
## sd_diff15:conditionintervals 6088 8214
## sd_diff15:conditionQDPs 5799 8116
## lo_ground_truth:meansTRUE:sd_diff15 4816 6637
## lo_ground_truth:meansTRUE:conditionHOPs 6561 8511
## lo_ground_truth:meansTRUE:conditionintervals 6201 8234
## lo_ground_truth:meansTRUE:conditionQDPs 6154 8418
## lo_ground_truth:sd_diff15:conditionHOPs 6523 8631
## lo_ground_truth:sd_diff15:conditionintervals 5991 7867
## lo_ground_truth:sd_diff15:conditionQDPs 5894 8005
## meansTRUE:sd_diff15:conditionHOPs 6118 8078
## meansTRUE:sd_diff15:conditionintervals 5874 7813
## meansTRUE:sd_diff15:conditionQDPs 5598 7665
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 6213 8160
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 5743 7297
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 5529 8172
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
add_predicted_draws(m.m.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.
# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.m.llo)
# run LOO to get weights
loo <- loo(m.m.llo, save_psis = TRUE, cores = 2)
## Warning: Found 198 observations with a pareto_k > 0.7 in model 'm.m.llo'. With
## this many problematic observations, it may be more appropriate to use 'kfold'
## with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
add_predicted_draws(m.m.llo, n = 500) %>%
ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

What does this look like in probability units?
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
add_predicted_draws(m.m.llo, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.
model_check_df %>%
add_predicted_draws(m.m.llo, n = 500) %>%
group_by(lo_ground_truth, worker_id) %>%
summarise(
p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
z_residual = qnorm(p_residual) # what are the z-scores of these cumulative probabilities?
) %>%
ggplot(aes(sample = z_residual)) +
geom_qq() +
geom_abline() +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

These still look pretty terrible.
What does the posterior for the slope of the LLO model look like when means are present vs absent at different levels of uncertainty, ignoring other manipulations?
model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.m.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out visualization condition by taking a weighted average
ggplot(aes(x = slope, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes for mean present/absent") +
theme(panel.grid = element_blank()) +
facet_grid(. ~ sd_diff)

This effect suggests that adding means has a debiasing effect on average when visualized uncertainty is high (marginalizing across visualization conditions). Again, is about what we expected to see. However, we expected the mean to have a biasing effect when uncertainty is low.
Let’s look at this difference in a forest plot style display.
model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.m.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
compare_levels(.value, by = means) %>% # look at differences in slopes between means present vs absent
rename(slope_diff = .value) %>%
group_by(sd_diff, .draw) %>% # group by predictors to keep
summarise(slope_diff = weighted.mean(slope_diff)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope_diff, y = sd_diff)) +
stat_halfeyeh() +
scale_x_continuous(expression(slope_diff), expand = c(0, 0)) +
labs(subtitle = "Posterior differences in slopes for means present vs absent") +
theme_bw()

What does the posterior for the slope in each visualization condition look like, marginalizing across other factors?
model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.m.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes by visualization condition") +
theme(panel.grid = element_blank())

Recall that a slope of 1 on the logit scale reflects no bias. This suggests that users are biased toward responses of 50% on the probability scale in all conditions but to different degrees. Quantile dotplots seem to have a substantial debiasing effect on effect size judgments when we marginalize across other manipulations.
What if we break these marginal effects down into simple effects for the interaction of the presence/absence of the mean, level of visualized uncertainty, and visualization condition?
model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.m.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
ggplot(aes(x = slope, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes for means * sd * visualization condition") +
theme(panel.grid = element_blank()) +
facet_grid(condition ~ sd_diff)

Again, this is what we expected to see. However, it is not completely clear form this chart if the simple effect of extrinsic means is reliable in some conditions.
Let’s look at the differences in a forest plot style display which should make the reliability of these differences a little easier to estimate visually.
model_df %>%
group_by(means, sd_diff, condition) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.m.llo, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
compare_levels(.value, by = means) %>% # look at differences in slopes between means present vs absent
rename(slope_diff = .value) %>%
unite(cond, condition, sd_diff, sep = "_", remove = FALSE) %>%
ggplot(aes(x = slope_diff, y = cond)) +
stat_halfeyeh() +
scale_x_continuous(expression(slope_diff), expand = c(0, 0)) +
labs(subtitle = "Posterior differences in slopes for means present vs absent") +
theme_bw()

What is the predicted pattern for responses for the average worker in each cell of this interaction?
model_df %>%
group_by(lo_ground_truth, means, sd_diff, condition) %>%
add_predicted_draws(m.m.llo, re_formula = NA, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(.prediction), color = means, fill = means)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95), alpha = .25) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid.minor = element_blank()) +
facet_grid(condition ~ sd_diff)

In these plots of the overall response function, we can see that the difference in performance induced by the mean is small relative to the difference between visualization conditions. We can also see that people are by far the least likely to underestimate effect size with quantile dotplots.
Next, we’ll try to get more precise estimates by expanding our random effects to include all of the within-subjects manipulations in our study design.
Building Up Random Effects for Within-Subjects Manipulations
In the minimal model to answer our research questions above, estimates for the effect of means are noisier than we would like, and predictive validity within subjects is not great. We’ll try to better account for heterogeneity across subjects by adding more random effects to our model for each within subjects manipulation.
Following a principle of model expansion, we will make this changes cumulatively. We include a series of model specifications that capture plausible structure in the data and fit without any sampling issues.
Random Effects for the Interaction of Means and Uncertainty Shown
This first model adds random effects for the within-subjects manipulations in our previous model. We prioritize the interaction between showing means and the level of uncertainty in the distributions since we had a hypothesis about this. We omit the interaction between these terms and the ground truth in the random effects specification because of fit issues: We are unable to identify the random effect of ground truth, means, and level of uncertainty with only one observation per unique combintation of these variables per worker.
# minimal LLO model with random effects for means and sd_diff
m.m.llo.r_means.sd <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth + means*sd_diff|sharecor|worker_id) + lo_ground_truth*means*sd_diff*condition,
sigma ~ (1|sharecor|worker_id)),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
# prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd")
summary(m.m.llo.r_means.sd)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | sharecor | worker_id) + lo_ground_truth * means * sd_diff * condition
## sigma ~ (1 | sharecor | worker_id)
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept) 0.53 0.02 0.48 0.58
## sd(lo_ground_truth) 0.45 0.01 0.42 0.48
## sd(meansTRUE) 0.22 0.02 0.19 0.26
## sd(sd_diff15) 0.17 0.01 0.15 0.19
## sd(meansTRUE:sd_diff15) 0.09 0.01 0.07 0.12
## sd(sigma_Intercept) 0.87 0.02 0.83 0.92
## cor(Intercept,lo_ground_truth) -0.22 0.05 -0.31 -0.12
## cor(Intercept,meansTRUE) 0.09 0.09 -0.08 0.26
## cor(lo_ground_truth,meansTRUE) -0.32 0.07 -0.45 -0.19
## cor(Intercept,sd_diff15) -0.55 0.07 -0.68 -0.40
## cor(lo_ground_truth,sd_diff15) 0.06 0.08 -0.09 0.21
## cor(meansTRUE,sd_diff15) -0.17 0.09 -0.35 0.01
## cor(Intercept,meansTRUE:sd_diff15) -0.40 0.16 -0.68 -0.06
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.32 0.13 0.04 0.57
## cor(meansTRUE,meansTRUE:sd_diff15) 0.39 0.15 0.08 0.67
## cor(sd_diff15,meansTRUE:sd_diff15) -0.04 0.14 -0.30 0.23
## cor(Intercept,sigma_Intercept) -0.33 0.04 -0.41 -0.24
## cor(lo_ground_truth,sigma_Intercept) 0.62 0.03 0.57 0.67
## cor(meansTRUE,sigma_Intercept) -0.35 0.06 -0.46 -0.24
## cor(sd_diff15,sigma_Intercept) 0.25 0.07 0.12 0.38
## cor(meansTRUE:sd_diff15,sigma_Intercept) 0.30 0.11 0.07 0.51
## Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.00 4744 7044
## sd(lo_ground_truth) 1.00 1817 4376
## sd(meansTRUE) 1.00 3855 6866
## sd(sd_diff15) 1.00 7043 8325
## sd(meansTRUE:sd_diff15) 1.00 5959 8574
## sd(sigma_Intercept) 1.00 5236 8258
## cor(Intercept,lo_ground_truth) 1.00 1744 3359
## cor(Intercept,meansTRUE) 1.00 4785 8137
## cor(lo_ground_truth,meansTRUE) 1.00 5443 8071
## cor(Intercept,sd_diff15) 1.00 7985 8401
## cor(lo_ground_truth,sd_diff15) 1.00 5183 7860
## cor(meansTRUE,sd_diff15) 1.00 5849 7964
## cor(Intercept,meansTRUE:sd_diff15) 1.00 8132 8301
## cor(lo_ground_truth,meansTRUE:sd_diff15) 1.00 5901 8257
## cor(meansTRUE,meansTRUE:sd_diff15) 1.00 5892 7990
## cor(sd_diff15,meansTRUE:sd_diff15) 1.00 8053 9032
## cor(Intercept,sigma_Intercept) 1.00 2686 4759
## cor(lo_ground_truth,sigma_Intercept) 1.00 5610 8158
## cor(meansTRUE,sigma_Intercept) 1.00 3260 6721
## cor(sd_diff15,sigma_Intercept) 1.00 3166 6162
## cor(meansTRUE:sd_diff15,sigma_Intercept) 1.00 1588 4597
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept -0.19 0.05
## sigma_Intercept -0.90 0.04
## lo_ground_truth 0.51 0.03
## meansTRUE -0.10 0.03
## sd_diff15 0.06 0.02
## conditionHOPs 0.06 0.07
## conditionintervals -0.11 0.06
## conditionQDPs 0.13 0.06
## lo_ground_truth:meansTRUE 0.00 0.01
## lo_ground_truth:sd_diff15 0.09 0.01
## meansTRUE:sd_diff15 0.08 0.03
## lo_ground_truth:conditionHOPs -0.13 0.04
## lo_ground_truth:conditionintervals -0.05 0.04
## lo_ground_truth:conditionQDPs 0.06 0.04
## meansTRUE:conditionHOPs 0.06 0.04
## meansTRUE:conditionintervals 0.02 0.03
## meansTRUE:conditionQDPs -0.01 0.03
## sd_diff15:conditionHOPs 0.06 0.03
## sd_diff15:conditionintervals 0.06 0.03
## sd_diff15:conditionQDPs -0.03 0.03
## lo_ground_truth:meansTRUE:sd_diff15 0.03 0.01
## lo_ground_truth:meansTRUE:conditionHOPs -0.01 0.02
## lo_ground_truth:meansTRUE:conditionintervals -0.02 0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.00 0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.05 0.02
## lo_ground_truth:sd_diff15:conditionintervals -0.00 0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.04 0.01
## meansTRUE:sd_diff15:conditionHOPs -0.02 0.04
## meansTRUE:sd_diff15:conditionintervals -0.02 0.03
## meansTRUE:sd_diff15:conditionQDPs -0.03 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.01 0.02
## l-95% CI u-95% CI Rhat
## Intercept -0.28 -0.10 1.00
## sigma_Intercept -0.97 -0.83 1.00
## lo_ground_truth 0.45 0.58 1.00
## meansTRUE -0.15 -0.05 1.00
## sd_diff15 0.02 0.11 1.00
## conditionHOPs -0.07 0.19 1.00
## conditionintervals -0.23 0.01 1.00
## conditionQDPs 0.01 0.25 1.00
## lo_ground_truth:meansTRUE -0.02 0.02 1.00
## lo_ground_truth:sd_diff15 0.08 0.11 1.00
## meansTRUE:sd_diff15 0.03 0.13 1.00
## lo_ground_truth:conditionHOPs -0.22 -0.04 1.00
## lo_ground_truth:conditionintervals -0.13 0.04 1.00
## lo_ground_truth:conditionQDPs -0.02 0.14 1.00
## meansTRUE:conditionHOPs -0.01 0.14 1.00
## meansTRUE:conditionintervals -0.05 0.09 1.00
## meansTRUE:conditionQDPs -0.07 0.06 1.00
## sd_diff15:conditionHOPs -0.01 0.12 1.00
## sd_diff15:conditionintervals -0.00 0.12 1.00
## sd_diff15:conditionQDPs -0.09 0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.00 0.05 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.04 0.02 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.04 0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.02 0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.02 0.08 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.03 0.02 1.00
## lo_ground_truth:sd_diff15:conditionQDPs 0.01 0.06 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.10 0.05 1.00
## meansTRUE:sd_diff15:conditionintervals -0.08 0.04 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.09 0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.07 0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.01 0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.04 0.03 1.00
## Bulk_ESS Tail_ESS
## Intercept 2178 3483
## sigma_Intercept 1806 4444
## lo_ground_truth 994 2310
## meansTRUE 3578 6955
## sd_diff15 3726 6032
## conditionHOPs 2638 4510
## conditionintervals 2137 3919
## conditionQDPs 1899 4165
## lo_ground_truth:meansTRUE 7581 8731
## lo_ground_truth:sd_diff15 6164 8394
## meansTRUE:sd_diff15 2916 6384
## lo_ground_truth:conditionHOPs 1484 3123
## lo_ground_truth:conditionintervals 1245 2880
## lo_ground_truth:conditionQDPs 1126 2836
## meansTRUE:conditionHOPs 5553 7867
## meansTRUE:conditionintervals 4722 7005
## meansTRUE:conditionQDPs 5302 7618
## sd_diff15:conditionHOPs 5675 8021
## sd_diff15:conditionintervals 5095 7190
## sd_diff15:conditionQDPs 4898 7203
## lo_ground_truth:meansTRUE:sd_diff15 7087 7121
## lo_ground_truth:meansTRUE:conditionHOPs 7713 8511
## lo_ground_truth:meansTRUE:conditionintervals 7422 9157
## lo_ground_truth:meansTRUE:conditionQDPs 7351 8335
## lo_ground_truth:sd_diff15:conditionHOPs 7175 8510
## lo_ground_truth:sd_diff15:conditionintervals 6643 8050
## lo_ground_truth:sd_diff15:conditionQDPs 6421 7804
## meansTRUE:sd_diff15:conditionHOPs 7628 8604
## meansTRUE:sd_diff15:conditionintervals 6466 8202
## meansTRUE:sd_diff15:conditionQDPs 6951 8850
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 7429 8462
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 6859 8629
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 6832 7506
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Mixed Effects of Ground Truth on Sigma
In this model, we add fixed and random effects of ground truth to our sigma submodel. We add a conservative but informative prior in order to model fixed effects on sigma.
# minimal LLO model with random effects for means, sd_diff, as well as ground truth for sigma submodel
m.m.llo.r_means.sd.sigma_gt <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition,
sigma ~ (1 + lo_ground_truth|worker_id) + lo_ground_truth),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_sigma_gt")
summary(m.m.llo.r_means.sd.sigma_gt)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition
## sigma ~ (1 + lo_ground_truth | worker_id) + lo_ground_truth
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept) 0.08 0.01 0.06 0.09
## sd(lo_ground_truth) 0.41 0.01 0.39 0.44
## sd(meansTRUE) 0.08 0.01 0.06 0.09
## sd(sd_diff15) 0.08 0.01 0.07 0.10
## sd(meansTRUE:sd_diff15) 0.07 0.01 0.06 0.09
## sd(sigma_Intercept) 1.24 0.03 1.18 1.31
## sd(sigma_lo_ground_truth) 0.44 0.01 0.42 0.47
## cor(Intercept,lo_ground_truth) -0.28 0.09 -0.45 -0.11
## cor(Intercept,meansTRUE) -0.39 0.10 -0.58 -0.17
## cor(lo_ground_truth,meansTRUE) -0.55 0.08 -0.69 -0.39
## cor(Intercept,sd_diff15) 0.11 0.11 -0.11 0.32
## cor(lo_ground_truth,sd_diff15) -0.02 0.09 -0.21 0.16
## cor(meansTRUE,sd_diff15) -0.05 0.11 -0.27 0.17
## cor(Intercept,meansTRUE:sd_diff15) -0.55 0.11 -0.75 -0.32
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.32 0.12 0.08 0.55
## cor(meansTRUE,meansTRUE:sd_diff15) 0.39 0.14 0.12 0.65
## cor(sd_diff15,meansTRUE:sd_diff15) -0.35 0.11 -0.54 -0.12
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.73 0.02 -0.76 -0.69
## Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.00 4504 6619
## sd(lo_ground_truth) 1.00 3435 6615
## sd(meansTRUE) 1.00 2234 5709
## sd(sd_diff15) 1.00 4784 6933
## sd(meansTRUE:sd_diff15) 1.00 6150 8421
## sd(sigma_Intercept) 1.00 2726 5021
## sd(sigma_lo_ground_truth) 1.00 3833 6548
## cor(Intercept,lo_ground_truth) 1.00 474 1041
## cor(Intercept,meansTRUE) 1.00 1542 4459
## cor(lo_ground_truth,meansTRUE) 1.00 4058 7074
## cor(Intercept,sd_diff15) 1.00 3780 5306
## cor(lo_ground_truth,sd_diff15) 1.00 4688 7818
## cor(meansTRUE,sd_diff15) 1.00 3444 5828
## cor(Intercept,meansTRUE:sd_diff15) 1.00 3324 6593
## cor(lo_ground_truth,meansTRUE:sd_diff15) 1.00 7082 8578
## cor(meansTRUE,meansTRUE:sd_diff15) 1.00 5380 7810
## cor(sd_diff15,meansTRUE:sd_diff15) 1.00 6824 8312
## cor(sigma_Intercept,sigma_lo_ground_truth) 1.00 4256 6455
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept -0.00 0.01
## sigma_Intercept -1.49 0.05
## lo_ground_truth 0.37 0.03
## meansTRUE -0.03 0.01
## sd_diff15 0.03 0.01
## conditionHOPs -0.04 0.02
## conditionintervals -0.02 0.02
## conditionQDPs 0.02 0.02
## lo_ground_truth:meansTRUE -0.01 0.01
## lo_ground_truth:sd_diff15 0.11 0.01
## meansTRUE:sd_diff15 0.02 0.02
## lo_ground_truth:conditionHOPs -0.03 0.05
## lo_ground_truth:conditionintervals -0.06 0.05
## lo_ground_truth:conditionQDPs 0.13 0.05
## meansTRUE:conditionHOPs 0.02 0.02
## meansTRUE:conditionintervals 0.02 0.02
## meansTRUE:conditionQDPs -0.02 0.02
## sd_diff15:conditionHOPs 0.02 0.02
## sd_diff15:conditionintervals 0.01 0.02
## sd_diff15:conditionQDPs -0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15 0.05 0.01
## lo_ground_truth:meansTRUE:conditionHOPs 0.01 0.02
## lo_ground_truth:meansTRUE:conditionintervals -0.01 0.01
## lo_ground_truth:meansTRUE:conditionQDPs 0.01 0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.06 0.02
## lo_ground_truth:sd_diff15:conditionintervals -0.01 0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.01 0.01
## meansTRUE:sd_diff15:conditionHOPs 0.02 0.03
## meansTRUE:sd_diff15:conditionintervals -0.01 0.02
## meansTRUE:sd_diff15:conditionQDPs 0.01 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.05 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.03 0.02
## sigma_lo_ground_truth 0.37 0.02
## l-95% CI u-95% CI Rhat
## Intercept -0.03 0.02 1.00
## sigma_Intercept -1.60 -1.39 1.00
## lo_ground_truth 0.31 0.44 1.00
## meansTRUE -0.05 -0.00 1.00
## sd_diff15 0.01 0.06 1.00
## conditionHOPs -0.08 -0.01 1.00
## conditionintervals -0.05 0.01 1.00
## conditionQDPs -0.01 0.05 1.00
## lo_ground_truth:meansTRUE -0.03 0.01 1.00
## lo_ground_truth:sd_diff15 0.09 0.13 1.00
## meansTRUE:sd_diff15 -0.00 0.05 1.00
## lo_ground_truth:conditionHOPs -0.13 0.07 1.00
## lo_ground_truth:conditionintervals -0.16 0.03 1.00
## lo_ground_truth:conditionQDPs 0.04 0.23 1.00
## meansTRUE:conditionHOPs -0.02 0.06 1.00
## meansTRUE:conditionintervals -0.01 0.05 1.00
## meansTRUE:conditionQDPs -0.05 0.01 1.00
## sd_diff15:conditionHOPs -0.03 0.06 1.00
## sd_diff15:conditionintervals -0.02 0.05 1.00
## sd_diff15:conditionQDPs -0.05 0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.02 0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.02 0.05 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.04 0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.02 0.03 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.03 0.09 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.04 0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs -0.02 0.04 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.04 0.07 1.00
## meansTRUE:sd_diff15:conditionintervals -0.05 0.03 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.03 0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.10 -0.01 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.02 0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.07 0.00 1.00
## sigma_lo_ground_truth 0.33 0.40 1.00
## Bulk_ESS Tail_ESS
## Intercept 5323 7694
## sigma_Intercept 1284 2414
## lo_ground_truth 3506 5632
## meansTRUE 4770 7049
## sd_diff15 5256 7418
## conditionHOPs 5876 7651
## conditionintervals 5103 7481
## conditionQDPs 5054 7952
## lo_ground_truth:meansTRUE 5052 7159
## lo_ground_truth:sd_diff15 5312 7748
## meansTRUE:sd_diff15 5271 7392
## lo_ground_truth:conditionHOPs 3997 6671
## lo_ground_truth:conditionintervals 3552 5534
## lo_ground_truth:conditionQDPs 3307 5615
## meansTRUE:conditionHOPs 5579 7979
## meansTRUE:conditionintervals 5013 7187
## meansTRUE:conditionQDPs 4883 6660
## sd_diff15:conditionHOPs 6009 7550
## sd_diff15:conditionintervals 5345 7074
## sd_diff15:conditionQDPs 5530 7099
## lo_ground_truth:meansTRUE:sd_diff15 4672 6898
## lo_ground_truth:meansTRUE:conditionHOPs 5974 7838
## lo_ground_truth:meansTRUE:conditionintervals 5542 8088
## lo_ground_truth:meansTRUE:conditionQDPs 5703 7616
## lo_ground_truth:sd_diff15:conditionHOPs 6489 7472
## lo_ground_truth:sd_diff15:conditionintervals 5584 7933
## lo_ground_truth:sd_diff15:conditionQDPs 6058 7697
## meansTRUE:sd_diff15:conditionHOPs 6059 7540
## meansTRUE:sd_diff15:conditionintervals 5890 7362
## meansTRUE:sd_diff15:conditionQDPs 5597 8208
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 5476 7261
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 5209 7035
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 5105 7481
## sigma_lo_ground_truth 2108 3513
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Mixed Effects of the Interaction of Means and Uncertainty Shown on Sigma
We tried out different models that add mixed effects on residual variance (sigma) for the interaction of extrinsic means and uncertainty shown. We tried many different variations of models with this set of predictors, but we were unable to achieve a usable fit. Multiple versions of the model ran for days before the chains finished sampling. The model below was the best version we managed to fit, but it still has some divergent samples. All of this indicates that we may be better off modeling this data without using means*sd_diff as a predictor of sigma.
# minimal LLO model with random effects for means, sd_diff, as well as ground truth, means, sd_diff for sigma submodel
m.m.llo.r_means.sd.sigma_gt.means.sd <- brm(
data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition,
sigma ~ (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_sigma_gt_means_sd")
summary(m.m.llo.r_means.sd.sigma_gt.means.sd)
## Warning: There were 182 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition
## sigma ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error
## sd(Intercept) 0.05 0.01
## sd(lo_ground_truth) 0.39 0.01
## sd(meansTRUE) 0.06 0.01
## sd(sd_diff15) 0.07 0.01
## sd(meansTRUE:sd_diff15) 0.05 0.01
## sd(sigma_Intercept) 1.33 0.04
## sd(sigma_lo_ground_truth) 0.40 0.01
## sd(sigma_meansTRUE) 0.91 0.03
## sd(sigma_sd_diff15) 0.59 0.03
## sd(sigma_meansTRUE:sd_diff15) 0.63 0.04
## cor(Intercept,lo_ground_truth) -0.41 0.10
## cor(Intercept,meansTRUE) -0.33 0.12
## cor(lo_ground_truth,meansTRUE) -0.57 0.09
## cor(Intercept,sd_diff15) 0.16 0.11
## cor(lo_ground_truth,sd_diff15) 0.01 0.10
## cor(meansTRUE,sd_diff15) -0.06 0.11
## cor(Intercept,meansTRUE:sd_diff15) -0.49 0.13
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.31 0.15
## cor(meansTRUE,meansTRUE:sd_diff15) 0.26 0.16
## cor(sd_diff15,meansTRUE:sd_diff15) -0.19 0.15
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.62 0.03
## cor(sigma_Intercept,sigma_meansTRUE) -0.15 0.04
## cor(sigma_lo_ground_truth,sigma_meansTRUE) -0.14 0.05
## cor(sigma_Intercept,sigma_sd_diff15) -0.48 0.04
## cor(sigma_lo_ground_truth,sigma_sd_diff15) 0.11 0.05
## cor(sigma_meansTRUE,sigma_sd_diff15) 0.15 0.05
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15) -0.06 0.05
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15) 0.11 0.06
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15) -0.55 0.04
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15) -0.29 0.06
## l-95% CI u-95% CI Rhat
## sd(Intercept) 0.04 0.07 1.00
## sd(lo_ground_truth) 0.37 0.42 1.00
## sd(meansTRUE) 0.05 0.07 1.00
## sd(sd_diff15) 0.06 0.08 1.00
## sd(meansTRUE:sd_diff15) 0.04 0.07 1.00
## sd(sigma_Intercept) 1.26 1.41 1.00
## sd(sigma_lo_ground_truth) 0.37 0.42 1.00
## sd(sigma_meansTRUE) 0.85 0.97 1.00
## sd(sigma_sd_diff15) 0.54 0.65 1.00
## sd(sigma_meansTRUE:sd_diff15) 0.56 0.70 1.00
## cor(Intercept,lo_ground_truth) -0.58 -0.20 1.00
## cor(Intercept,meansTRUE) -0.54 -0.08 1.00
## cor(lo_ground_truth,meansTRUE) -0.73 -0.39 1.00
## cor(Intercept,sd_diff15) -0.06 0.39 1.00
## cor(lo_ground_truth,sd_diff15) -0.19 0.20 1.00
## cor(meansTRUE,sd_diff15) -0.28 0.16 1.00
## cor(Intercept,meansTRUE:sd_diff15) -0.73 -0.21 1.00
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.01 0.57 1.00
## cor(meansTRUE,meansTRUE:sd_diff15) -0.07 0.57 1.00
## cor(sd_diff15,meansTRUE:sd_diff15) -0.45 0.12 1.00
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.67 -0.57 1.00
## cor(sigma_Intercept,sigma_meansTRUE) -0.23 -0.07 1.00
## cor(sigma_lo_ground_truth,sigma_meansTRUE) -0.23 -0.05 1.00
## cor(sigma_Intercept,sigma_sd_diff15) -0.55 -0.40 1.00
## cor(sigma_lo_ground_truth,sigma_sd_diff15) 0.01 0.21 1.00
## cor(sigma_meansTRUE,sigma_sd_diff15) 0.04 0.25 1.00
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15) -0.16 0.05 1.00
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15) -0.00 0.22 1.00
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15) -0.63 -0.47 1.00
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15) -0.41 -0.17 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 1513 2905
## sd(lo_ground_truth) 1459 3766
## sd(meansTRUE) 516 1120
## sd(sd_diff15) 2547 5089
## sd(meansTRUE:sd_diff15) 1727 3595
## sd(sigma_Intercept) 1410 2622
## sd(sigma_lo_ground_truth) 1826 4026
## sd(sigma_meansTRUE) 2303 4370
## sd(sigma_sd_diff15) 1827 3331
## sd(sigma_meansTRUE:sd_diff15) 1854 3483
## cor(Intercept,lo_ground_truth) 207 170
## cor(Intercept,meansTRUE) 444 1133
## cor(lo_ground_truth,meansTRUE) 1064 2150
## cor(Intercept,sd_diff15) 1710 3420
## cor(lo_ground_truth,sd_diff15) 2023 4948
## cor(meansTRUE,sd_diff15) 1015 2607
## cor(Intercept,meansTRUE:sd_diff15) 2250 4260
## cor(lo_ground_truth,meansTRUE:sd_diff15) 2895 6601
## cor(meansTRUE,meansTRUE:sd_diff15) 1703 4274
## cor(sd_diff15,meansTRUE:sd_diff15) 2231 5074
## cor(sigma_Intercept,sigma_lo_ground_truth) 2080 4120
## cor(sigma_Intercept,sigma_meansTRUE) 1708 3566
## cor(sigma_lo_ground_truth,sigma_meansTRUE) 1287 2771
## cor(sigma_Intercept,sigma_sd_diff15) 2935 5007
## cor(sigma_lo_ground_truth,sigma_sd_diff15) 2564 4403
## cor(sigma_meansTRUE,sigma_sd_diff15) 1969 3275
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15) 2542 4288
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15) 2270 4723
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15) 3156 5458
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15) 1674 3586
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept 0.00 0.01
## sigma_Intercept -1.75 0.06
## lo_ground_truth 0.35 0.03
## meansTRUE -0.03 0.01
## sd_diff15 0.03 0.01
## conditionHOPs -0.04 0.01
## conditionintervals -0.01 0.01
## conditionQDPs 0.01 0.01
## lo_ground_truth:meansTRUE 0.00 0.01
## lo_ground_truth:sd_diff15 0.11 0.01
## meansTRUE:sd_diff15 0.02 0.01
## lo_ground_truth:conditionHOPs -0.08 0.05
## lo_ground_truth:conditionintervals -0.08 0.05
## lo_ground_truth:conditionQDPs 0.14 0.05
## meansTRUE:conditionHOPs 0.03 0.01
## meansTRUE:conditionintervals 0.02 0.01
## meansTRUE:conditionQDPs -0.02 0.01
## sd_diff15:conditionHOPs 0.01 0.02
## sd_diff15:conditionintervals 0.01 0.02
## sd_diff15:conditionQDPs 0.00 0.02
## lo_ground_truth:meansTRUE:sd_diff15 0.05 0.01
## lo_ground_truth:meansTRUE:conditionHOPs 0.00 0.01
## lo_ground_truth:meansTRUE:conditionintervals -0.01 0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.00 0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.07 0.02
## lo_ground_truth:sd_diff15:conditionintervals 0.00 0.01
## lo_ground_truth:sd_diff15:conditionQDPs -0.00 0.01
## meansTRUE:sd_diff15:conditionHOPs 0.03 0.02
## meansTRUE:sd_diff15:conditionintervals 0.00 0.02
## meansTRUE:sd_diff15:conditionQDPs 0.00 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.06 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.01 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.02 0.02
## sigma_lo_ground_truth 0.39 0.02
## sigma_meansTRUE -0.16 0.05
## sigma_sd_diff15 0.21 0.04
## sigma_lo_ground_truth:meansTRUE 0.00 0.02
## sigma_lo_ground_truth:sd_diff15 0.02 0.02
## sigma_meansTRUE:sd_diff15 0.12 0.05
## sigma_lo_ground_truth:meansTRUE:sd_diff15 -0.02 0.03
## l-95% CI u-95% CI Rhat
## Intercept -0.01 0.02 1.00
## sigma_Intercept -1.87 -1.64 1.00
## lo_ground_truth 0.29 0.41 1.00
## meansTRUE -0.05 -0.01 1.00
## sd_diff15 0.01 0.05 1.00
## conditionHOPs -0.06 -0.01 1.00
## conditionintervals -0.03 0.01 1.00
## conditionQDPs -0.02 0.03 1.00
## lo_ground_truth:meansTRUE -0.01 0.01 1.00
## lo_ground_truth:sd_diff15 0.09 0.13 1.00
## meansTRUE:sd_diff15 -0.01 0.05 1.00
## lo_ground_truth:conditionHOPs -0.17 0.01 1.01
## lo_ground_truth:conditionintervals -0.16 0.01 1.00
## lo_ground_truth:conditionQDPs 0.05 0.23 1.00
## meansTRUE:conditionHOPs 0.01 0.06 1.00
## meansTRUE:conditionintervals -0.00 0.04 1.00
## meansTRUE:conditionQDPs -0.04 0.01 1.00
## sd_diff15:conditionHOPs -0.04 0.05 1.00
## sd_diff15:conditionintervals -0.02 0.04 1.00
## sd_diff15:conditionQDPs -0.03 0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.02 0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.01 0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.03 0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.02 0.01 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.03 0.10 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.02 0.03 1.00
## lo_ground_truth:sd_diff15:conditionQDPs -0.03 0.02 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.02 0.07 1.00
## meansTRUE:sd_diff15:conditionintervals -0.03 0.04 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.03 0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.11 -0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.05 0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.05 0.02 1.00
## sigma_lo_ground_truth 0.35 0.44 1.00
## sigma_meansTRUE -0.26 -0.07 1.00
## sigma_sd_diff15 0.14 0.29 1.00
## sigma_lo_ground_truth:meansTRUE -0.04 0.05 1.00
## sigma_lo_ground_truth:sd_diff15 -0.02 0.06 1.00
## sigma_meansTRUE:sd_diff15 0.02 0.22 1.00
## sigma_lo_ground_truth:meansTRUE:sd_diff15 -0.07 0.04 1.00
## Bulk_ESS Tail_ESS
## Intercept 1454 2969
## sigma_Intercept 747 1042
## lo_ground_truth 847 2098
## meansTRUE 1249 2739
## sd_diff15 2251 3921
## conditionHOPs 2102 3570
## conditionintervals 1712 2627
## conditionQDPs 1263 3160
## lo_ground_truth:meansTRUE 3058 5845
## lo_ground_truth:sd_diff15 2496 4809
## meansTRUE:sd_diff15 2131 3784
## lo_ground_truth:conditionHOPs 1013 2465
## lo_ground_truth:conditionintervals 966 2011
## lo_ground_truth:conditionQDPs 938 2076
## meansTRUE:conditionHOPs 1682 3250
## meansTRUE:conditionintervals 1296 2837
## meansTRUE:conditionQDPs 1272 2405
## sd_diff15:conditionHOPs 3094 5492
## sd_diff15:conditionintervals 2308 3973
## sd_diff15:conditionQDPs 2572 4404
## lo_ground_truth:meansTRUE:sd_diff15 2117 3513
## lo_ground_truth:meansTRUE:conditionHOPs 2902 4918
## lo_ground_truth:meansTRUE:conditionintervals 3386 6188
## lo_ground_truth:meansTRUE:conditionQDPs 3259 6243
## lo_ground_truth:sd_diff15:conditionHOPs 3127 5422
## lo_ground_truth:sd_diff15:conditionintervals 2847 5773
## lo_ground_truth:sd_diff15:conditionQDPs 2780 5355
## meansTRUE:sd_diff15:conditionHOPs 2808 5188
## meansTRUE:sd_diff15:conditionintervals 2298 3884
## meansTRUE:sd_diff15:conditionQDPs 2536 5075
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 2663 5009
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 2401 3984
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 2483 4674
## sigma_lo_ground_truth 1492 2465
## sigma_meansTRUE 1755 3479
## sigma_sd_diff15 2374 4410
## sigma_lo_ground_truth:meansTRUE 3591 5784
## sigma_lo_ground_truth:sd_diff15 3412 5977
## sigma_meansTRUE:sd_diff15 2984 5166
## sigma_lo_ground_truth:meansTRUE:sd_diff15 3489 5940
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Mixed Effects of Trial Order on Mean Response
Building on our model with ground truth as a predictor of sigma, this model adds mixed effects of trial order on mean response. This is effectively modeling a learning effect on the mean response at each level of ground truth.
# minimal LLO model with random effects for means, sd_diff, trial as well as ground truth for sigma submodel
m.m.llo.r_means.sd.trial.sigma_gt <- brm(
data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition + lo_ground_truth*condition*trial,
sigma ~ (1 + lo_ground_truth|worker_id) + lo_ground_truth),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_trial_sigma_gt3")
summary(m.m.llo.r_means.sd.trial.sigma_gt)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition + lo_ground_truth * condition * trial
## sigma ~ (1 + lo_ground_truth | worker_id) + lo_ground_truth
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI
## sd(Intercept) 0.07 0.01 0.06
## sd(lo_ground_truth) 0.41 0.01 0.38
## sd(trial) 0.03 0.02 0.00
## sd(meansTRUE) 0.03 0.01 0.02
## sd(sd_diff15) 0.08 0.01 0.07
## sd(lo_ground_truth:trial) 0.28 0.02 0.25
## sd(meansTRUE:sd_diff15) 0.06 0.01 0.04
## sd(sigma_Intercept) 1.25 0.03 1.19
## sd(sigma_lo_ground_truth) 0.45 0.01 0.42
## cor(Intercept,lo_ground_truth) -0.43 0.09 -0.58
## cor(Intercept,trial) 0.16 0.23 -0.34
## cor(lo_ground_truth,trial) -0.20 0.24 -0.61
## cor(Intercept,meansTRUE) -0.02 0.18 -0.37
## cor(lo_ground_truth,meansTRUE) -0.64 0.13 -0.84
## cor(trial,meansTRUE) 0.16 0.25 -0.36
## cor(Intercept,sd_diff15) 0.06 0.11 -0.16
## cor(lo_ground_truth,sd_diff15) 0.00 0.09 -0.17
## cor(trial,sd_diff15) 0.11 0.21 -0.33
## cor(meansTRUE,sd_diff15) 0.03 0.16 -0.29
## cor(Intercept,lo_ground_truth:trial) -0.30 0.09 -0.48
## cor(lo_ground_truth,lo_ground_truth:trial) 0.38 0.06 0.26
## cor(trial,lo_ground_truth:trial) -0.18 0.23 -0.59
## cor(meansTRUE,lo_ground_truth:trial) -0.16 0.16 -0.46
## cor(sd_diff15,lo_ground_truth:trial) -0.02 0.09 -0.19
## cor(Intercept,meansTRUE:sd_diff15) -0.38 0.14 -0.63
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.27 0.14 -0.01
## cor(trial,meansTRUE:sd_diff15) 0.06 0.23 -0.40
## cor(meansTRUE,meansTRUE:sd_diff15) -0.07 0.19 -0.43
## cor(sd_diff15,meansTRUE:sd_diff15) -0.31 0.12 -0.54
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) 0.02 0.13 -0.24
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.74 0.02 -0.77
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.08 1.00 3140 5794
## sd(lo_ground_truth) 0.43 1.00 1884 5204
## sd(trial) 0.06 1.00 1276 3767
## sd(meansTRUE) 0.05 1.00 1266 3688
## sd(sd_diff15) 0.10 1.00 4351 6757
## sd(lo_ground_truth:trial) 0.31 1.00 2720 5994
## sd(meansTRUE:sd_diff15) 0.07 1.00 3349 5452
## sd(sigma_Intercept) 1.32 1.00 2285 4270
## sd(sigma_lo_ground_truth) 0.47 1.00 3299 5987
## cor(Intercept,lo_ground_truth) -0.24 1.00 348 790
## cor(Intercept,trial) 0.57 1.00 5497 7326
## cor(lo_ground_truth,trial) 0.31 1.00 4534 7555
## cor(Intercept,meansTRUE) 0.35 1.00 2077 4976
## cor(lo_ground_truth,meansTRUE) -0.34 1.00 3228 5854
## cor(trial,meansTRUE) 0.61 1.00 2335 4563
## cor(Intercept,sd_diff15) 0.27 1.00 3127 5812
## cor(lo_ground_truth,sd_diff15) 0.18 1.00 3941 7927
## cor(trial,sd_diff15) 0.50 1.01 312 893
## cor(meansTRUE,sd_diff15) 0.35 1.00 688 1323
## cor(Intercept,lo_ground_truth:trial) -0.12 1.00 962 2781
## cor(lo_ground_truth,lo_ground_truth:trial) 0.50 1.00 6000 7388
## cor(trial,lo_ground_truth:trial) 0.31 1.00 320 883
## cor(meansTRUE,lo_ground_truth:trial) 0.16 1.00 359 887
## cor(sd_diff15,lo_ground_truth:trial) 0.15 1.00 2375 5128
## cor(Intercept,meansTRUE:sd_diff15) -0.11 1.00 3970 7025
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.52 1.00 4785 7182
## cor(trial,meansTRUE:sd_diff15) 0.49 1.00 1325 2513
## cor(meansTRUE,meansTRUE:sd_diff15) 0.32 1.00 2270 5051
## cor(sd_diff15,meansTRUE:sd_diff15) -0.05 1.00 3540 6986
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) 0.27 1.00 1986 6033
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.70 1.00 3770 6475
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept -0.01 0.01
## sigma_Intercept -1.50 0.05
## lo_ground_truth 0.38 0.03
## meansTRUE -0.02 0.01
## sd_diff15 0.03 0.01
## conditionHOPs -0.05 0.02
## conditionintervals -0.02 0.01
## conditionQDPs 0.01 0.01
## trial -0.04 0.01
## lo_ground_truth:meansTRUE -0.02 0.01
## lo_ground_truth:sd_diff15 0.10 0.01
## meansTRUE:sd_diff15 0.02 0.01
## lo_ground_truth:conditionHOPs -0.02 0.05
## lo_ground_truth:conditionintervals -0.06 0.05
## lo_ground_truth:conditionQDPs 0.14 0.05
## meansTRUE:conditionHOPs 0.03 0.02
## meansTRUE:conditionintervals 0.03 0.01
## meansTRUE:conditionQDPs -0.01 0.01
## sd_diff15:conditionHOPs 0.02 0.02
## sd_diff15:conditionintervals 0.01 0.02
## sd_diff15:conditionQDPs -0.02 0.02
## lo_ground_truth:trial 0.10 0.03
## conditionHOPs:trial 0.07 0.03
## conditionintervals:trial 0.03 0.02
## conditionQDPs:trial 0.03 0.02
## lo_ground_truth:meansTRUE:sd_diff15 0.05 0.01
## lo_ground_truth:meansTRUE:conditionHOPs -0.01 0.02
## lo_ground_truth:meansTRUE:conditionintervals -0.01 0.02
## lo_ground_truth:meansTRUE:conditionQDPs -0.00 0.02
## lo_ground_truth:sd_diff15:conditionHOPs 0.06 0.02
## lo_ground_truth:sd_diff15:conditionintervals -0.02 0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.01 0.01
## meansTRUE:sd_diff15:conditionHOPs 0.02 0.03
## meansTRUE:sd_diff15:conditionintervals -0.02 0.02
## meansTRUE:sd_diff15:conditionQDPs 0.01 0.02
## lo_ground_truth:conditionHOPs:trial -0.10 0.04
## lo_ground_truth:conditionintervals:trial 0.01 0.04
## lo_ground_truth:conditionQDPs:trial -0.01 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.06 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.02 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.03 0.02
## sigma_lo_ground_truth 0.35 0.02
## l-95% CI u-95% CI Rhat
## Intercept -0.03 0.01 1.00
## sigma_Intercept -1.60 -1.40 1.00
## lo_ground_truth 0.31 0.44 1.00
## meansTRUE -0.04 -0.00 1.00
## sd_diff15 0.01 0.06 1.00
## conditionHOPs -0.08 -0.01 1.00
## conditionintervals -0.05 0.00 1.00
## conditionQDPs -0.01 0.04 1.00
## trial -0.06 -0.01 1.00
## lo_ground_truth:meansTRUE -0.04 0.00 1.00
## lo_ground_truth:sd_diff15 0.09 0.12 1.00
## meansTRUE:sd_diff15 -0.00 0.05 1.00
## lo_ground_truth:conditionHOPs -0.11 0.08 1.00
## lo_ground_truth:conditionintervals -0.15 0.03 1.00
## lo_ground_truth:conditionQDPs 0.04 0.23 1.00
## meansTRUE:conditionHOPs -0.00 0.07 1.00
## meansTRUE:conditionintervals 0.00 0.05 1.00
## meansTRUE:conditionQDPs -0.04 0.02 1.00
## sd_diff15:conditionHOPs -0.03 0.06 1.00
## sd_diff15:conditionintervals -0.02 0.05 1.00
## sd_diff15:conditionQDPs -0.05 0.02 1.00
## lo_ground_truth:trial 0.05 0.16 1.00
## conditionHOPs:trial 0.02 0.12 1.00
## conditionintervals:trial -0.01 0.06 1.00
## conditionQDPs:trial -0.01 0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.03 0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.04 0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.04 0.02 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.03 0.03 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.03 0.09 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.04 0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs -0.02 0.04 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.03 0.07 1.00
## meansTRUE:sd_diff15:conditionintervals -0.05 0.02 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.03 0.05 1.00
## lo_ground_truth:conditionHOPs:trial -0.19 -0.02 1.00
## lo_ground_truth:conditionintervals:trial -0.06 0.09 1.00
## lo_ground_truth:conditionQDPs:trial -0.09 0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.10 -0.01 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.01 0.06 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.07 0.00 1.00
## sigma_lo_ground_truth 0.32 0.39 1.00
## Bulk_ESS Tail_ESS
## Intercept 4325 5812
## sigma_Intercept 1241 2419
## lo_ground_truth 2043 4011
## meansTRUE 4265 7230
## sd_diff15 4178 6496
## conditionHOPs 5047 7572
## conditionintervals 4445 6526
## conditionQDPs 3535 5704
## trial 7362 8484
## lo_ground_truth:meansTRUE 5397 7792
## lo_ground_truth:sd_diff15 4899 7791
## meansTRUE:sd_diff15 4223 7238
## lo_ground_truth:conditionHOPs 2530 4736
## lo_ground_truth:conditionintervals 2268 4105
## lo_ground_truth:conditionQDPs 2248 4180
## meansTRUE:conditionHOPs 5959 7246
## meansTRUE:conditionintervals 4674 7345
## meansTRUE:conditionQDPs 4145 7443
## sd_diff15:conditionHOPs 5957 8107
## sd_diff15:conditionintervals 4997 7785
## sd_diff15:conditionQDPs 4740 6988
## lo_ground_truth:trial 4647 7841
## conditionHOPs:trial 7719 8604
## conditionintervals:trial 7360 8575
## conditionQDPs:trial 7092 7100
## lo_ground_truth:meansTRUE:sd_diff15 4533 7175
## lo_ground_truth:meansTRUE:conditionHOPs 6238 7639
## lo_ground_truth:meansTRUE:conditionintervals 5875 7883
## lo_ground_truth:meansTRUE:conditionQDPs 5722 8075
## lo_ground_truth:sd_diff15:conditionHOPs 6215 7948
## lo_ground_truth:sd_diff15:conditionintervals 5246 7799
## lo_ground_truth:sd_diff15:conditionQDPs 5462 7555
## meansTRUE:sd_diff15:conditionHOPs 5977 8312
## meansTRUE:sd_diff15:conditionintervals 4868 7095
## meansTRUE:sd_diff15:conditionQDPs 5018 7536
## lo_ground_truth:conditionHOPs:trial 5657 8005
## lo_ground_truth:conditionintervals:trial 4630 7588
## lo_ground_truth:conditionQDPs:trial 4693 7329
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 5525 8047
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 4919 7513
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 5144 7796
## sigma_lo_ground_truth 1833 3589
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Mixed Effects of Trial Order on Sigma
This model adds fixed and random intercepts of trial order to sigma submodel. This is a a learning effect on residual variance.
# minimal LLO model with random effects for means, sd_diff, trial as well as ground truth for sigma submodel
m.m.llo.r_means.sd.trial.sigma_gt.trial <- brm(
data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition + lo_ground_truth*condition*trial,
sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_trial_sigma_gt_trial3b")
summary(m.m.llo.r_means.sd.trial.sigma_gt.trial)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition + lo_ground_truth * condition * trial
## sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI
## sd(Intercept) 0.06 0.01 0.05
## sd(lo_ground_truth) 0.39 0.01 0.37
## sd(trial) 0.03 0.01 0.00
## sd(meansTRUE) 0.04 0.01 0.02
## sd(sd_diff15) 0.08 0.01 0.07
## sd(lo_ground_truth:trial) 0.24 0.02 0.21
## sd(meansTRUE:sd_diff15) 0.06 0.01 0.04
## sd(sigma_Intercept) 1.18 0.03 1.12
## sd(sigma_lo_ground_truth) 0.41 0.01 0.38
## sd(sigma_trial) 1.19 0.04 1.12
## cor(Intercept,lo_ground_truth) -0.43 0.09 -0.61
## cor(Intercept,trial) 0.19 0.22 -0.28
## cor(lo_ground_truth,trial) -0.30 0.22 -0.67
## cor(Intercept,meansTRUE) -0.00 0.17 -0.33
## cor(lo_ground_truth,meansTRUE) -0.66 0.11 -0.84
## cor(trial,meansTRUE) 0.29 0.24 -0.22
## cor(Intercept,sd_diff15) -0.00 0.11 -0.21
## cor(lo_ground_truth,sd_diff15) 0.01 0.09 -0.15
## cor(trial,sd_diff15) 0.01 0.22 -0.45
## cor(meansTRUE,sd_diff15) 0.02 0.15 -0.29
## cor(Intercept,lo_ground_truth:trial) -0.25 0.09 -0.43
## cor(lo_ground_truth,lo_ground_truth:trial) 0.41 0.06 0.29
## cor(trial,lo_ground_truth:trial) -0.40 0.22 -0.73
## cor(meansTRUE,lo_ground_truth:trial) -0.22 0.14 -0.48
## cor(sd_diff15,lo_ground_truth:trial) 0.06 0.08 -0.10
## cor(Intercept,meansTRUE:sd_diff15) -0.36 0.13 -0.61
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.24 0.13 -0.04
## cor(trial,meansTRUE:sd_diff15) 0.17 0.22 -0.28
## cor(meansTRUE,meansTRUE:sd_diff15) 0.05 0.18 -0.30
## cor(sd_diff15,meansTRUE:sd_diff15) -0.33 0.12 -0.54
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) -0.13 0.12 -0.37
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.71 0.02 -0.75
## cor(sigma_Intercept,sigma_trial) 0.10 0.04 0.02
## cor(sigma_lo_ground_truth,sigma_trial) -0.06 0.04 -0.14
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.07 1.00 3302 6168
## sd(lo_ground_truth) 0.42 1.00 2223 5191
## sd(trial) 0.06 1.00 1258 1948
## sd(meansTRUE) 0.05 1.00 1255 2869
## sd(sd_diff15) 0.09 1.00 3533 6335
## sd(lo_ground_truth:trial) 0.27 1.00 1830 4692
## sd(meansTRUE:sd_diff15) 0.07 1.00 3124 5466
## sd(sigma_Intercept) 1.25 1.00 2677 4515
## sd(sigma_lo_ground_truth) 0.43 1.00 3720 5944
## sd(sigma_trial) 1.27 1.00 6034 8269
## cor(Intercept,lo_ground_truth) -0.24 1.00 389 907
## cor(Intercept,trial) 0.58 1.00 6316 6887
## cor(lo_ground_truth,trial) 0.19 1.00 4054 5940
## cor(Intercept,meansTRUE) 0.35 1.00 1794 5101
## cor(lo_ground_truth,meansTRUE) -0.41 1.00 2969 4560
## cor(trial,meansTRUE) 0.68 1.00 1831 4791
## cor(Intercept,sd_diff15) 0.21 1.00 2371 5102
## cor(lo_ground_truth,sd_diff15) 0.19 1.00 3281 7010
## cor(trial,sd_diff15) 0.43 1.02 364 476
## cor(meansTRUE,sd_diff15) 0.30 1.00 483 923
## cor(Intercept,lo_ground_truth:trial) -0.06 1.00 1336 3068
## cor(lo_ground_truth,lo_ground_truth:trial) 0.53 1.00 5619 7916
## cor(trial,lo_ground_truth:trial) 0.11 1.00 355 823
## cor(meansTRUE,lo_ground_truth:trial) 0.06 1.00 726 1637
## cor(sd_diff15,lo_ground_truth:trial) 0.23 1.00 2849 5863
## cor(Intercept,meansTRUE:sd_diff15) -0.08 1.00 4202 7124
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.48 1.00 4514 8125
## cor(trial,meansTRUE:sd_diff15) 0.56 1.00 1148 2169
## cor(meansTRUE,meansTRUE:sd_diff15) 0.42 1.00 2117 4244
## cor(sd_diff15,meansTRUE:sd_diff15) -0.08 1.00 4177 6823
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) 0.12 1.00 3506 7178
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.67 1.00 4346 6844
## cor(sigma_Intercept,sigma_trial) 0.17 1.00 4789 6896
## cor(sigma_lo_ground_truth,sigma_trial) 0.03 1.00 3904 5935
##
## Population-Level Effects:
## Estimate Est.Error
## Intercept -0.01 0.01
## sigma_Intercept -1.79 0.09
## lo_ground_truth 0.38 0.03
## meansTRUE -0.02 0.01
## sd_diff15 0.04 0.01
## conditionHOPs -0.04 0.02
## conditionintervals -0.01 0.01
## conditionQDPs 0.01 0.01
## trial -0.03 0.01
## lo_ground_truth:meansTRUE -0.02 0.01
## lo_ground_truth:sd_diff15 0.10 0.01
## meansTRUE:sd_diff15 0.01 0.01
## lo_ground_truth:conditionHOPs -0.05 0.05
## lo_ground_truth:conditionintervals -0.08 0.05
## lo_ground_truth:conditionQDPs 0.14 0.05
## meansTRUE:conditionHOPs 0.03 0.02
## meansTRUE:conditionintervals 0.02 0.01
## meansTRUE:conditionQDPs -0.01 0.01
## sd_diff15:conditionHOPs 0.02 0.02
## sd_diff15:conditionintervals 0.01 0.02
## sd_diff15:conditionQDPs -0.01 0.02
## lo_ground_truth:trial 0.09 0.03
## conditionHOPs:trial 0.05 0.02
## conditionintervals:trial 0.02 0.02
## conditionQDPs:trial 0.03 0.02
## lo_ground_truth:meansTRUE:sd_diff15 0.06 0.01
## lo_ground_truth:meansTRUE:conditionHOPs -0.01 0.02
## lo_ground_truth:meansTRUE:conditionintervals 0.00 0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.01 0.01
## lo_ground_truth:sd_diff15:conditionHOPs 0.06 0.02
## lo_ground_truth:sd_diff15:conditionintervals -0.01 0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.01 0.01
## meansTRUE:sd_diff15:conditionHOPs 0.03 0.02
## meansTRUE:sd_diff15:conditionintervals -0.00 0.02
## meansTRUE:sd_diff15:conditionQDPs 0.01 0.02
## lo_ground_truth:conditionHOPs:trial -0.08 0.04
## lo_ground_truth:conditionintervals:trial -0.01 0.04
## lo_ground_truth:conditionQDPs:trial 0.00 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.07 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.00 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.04 0.02
## sigma_lo_ground_truth 0.46 0.03
## sigma_conditionHOPs 0.59 0.12
## sigma_conditionintervals 0.16 0.12
## sigma_conditionQDPs -0.06 0.12
## sigma_trial -0.30 0.10
## sigma_lo_ground_truth:conditionHOPs -0.17 0.05
## sigma_lo_ground_truth:conditionintervals -0.10 0.05
## sigma_lo_ground_truth:conditionQDPs -0.02 0.05
## sigma_lo_ground_truth:trial 0.03 0.05
## sigma_conditionHOPs:trial 0.09 0.14
## sigma_conditionintervals:trial 0.14 0.14
## sigma_conditionQDPs:trial -0.03 0.14
## sigma_lo_ground_truth:conditionHOPs:trial 0.03 0.07
## sigma_lo_ground_truth:conditionintervals:trial 0.06 0.07
## sigma_lo_ground_truth:conditionQDPs:trial -0.03 0.07
## l-95% CI u-95% CI Rhat
## Intercept -0.03 0.01 1.00
## sigma_Intercept -1.96 -1.62 1.00
## lo_ground_truth 0.31 0.44 1.00
## meansTRUE -0.04 0.00 1.00
## sd_diff15 0.01 0.06 1.00
## conditionHOPs -0.07 -0.01 1.00
## conditionintervals -0.04 0.01 1.00
## conditionQDPs -0.01 0.04 1.00
## trial -0.06 -0.01 1.00
## lo_ground_truth:meansTRUE -0.04 -0.00 1.00
## lo_ground_truth:sd_diff15 0.08 0.12 1.00
## meansTRUE:sd_diff15 -0.01 0.04 1.00
## lo_ground_truth:conditionHOPs -0.14 0.04 1.00
## lo_ground_truth:conditionintervals -0.17 0.01 1.00
## lo_ground_truth:conditionQDPs 0.05 0.23 1.00
## meansTRUE:conditionHOPs -0.00 0.07 1.00
## meansTRUE:conditionintervals -0.01 0.05 1.00
## meansTRUE:conditionQDPs -0.04 0.02 1.00
## sd_diff15:conditionHOPs -0.03 0.06 1.00
## sd_diff15:conditionintervals -0.02 0.05 1.00
## sd_diff15:conditionQDPs -0.05 0.02 1.00
## lo_ground_truth:trial 0.04 0.14 1.00
## conditionHOPs:trial 0.00 0.09 1.00
## conditionintervals:trial -0.01 0.06 1.00
## conditionQDPs:trial -0.00 0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15 0.04 0.09 1.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.04 0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals -0.03 0.03 1.00
## lo_ground_truth:meansTRUE:conditionQDPs -0.04 0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.03 0.10 1.00
## lo_ground_truth:sd_diff15:conditionintervals -0.04 0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs -0.02 0.03 1.00
## meansTRUE:sd_diff15:conditionHOPs -0.02 0.07 1.00
## meansTRUE:sd_diff15:conditionintervals -0.04 0.03 1.00
## meansTRUE:sd_diff15:conditionQDPs -0.03 0.05 1.00
## lo_ground_truth:conditionHOPs:trial -0.15 -0.00 1.00
## lo_ground_truth:conditionintervals:trial -0.08 0.06 1.00
## lo_ground_truth:conditionQDPs:trial -0.07 0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.11 -0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.04 0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.07 -0.00 1.00
## sigma_lo_ground_truth 0.39 0.52 1.00
## sigma_conditionHOPs 0.34 0.83 1.00
## sigma_conditionintervals -0.07 0.40 1.00
## sigma_conditionQDPs -0.29 0.18 1.00
## sigma_trial -0.49 -0.11 1.00
## sigma_lo_ground_truth:conditionHOPs -0.26 -0.08 1.00
## sigma_lo_ground_truth:conditionintervals -0.19 -0.00 1.00
## sigma_lo_ground_truth:conditionQDPs -0.11 0.07 1.00
## sigma_lo_ground_truth:trial -0.06 0.13 1.00
## sigma_conditionHOPs:trial -0.19 0.36 1.00
## sigma_conditionintervals:trial -0.14 0.41 1.00
## sigma_conditionQDPs:trial -0.30 0.25 1.00
## sigma_lo_ground_truth:conditionHOPs:trial -0.10 0.17 1.00
## sigma_lo_ground_truth:conditionintervals:trial -0.08 0.20 1.00
## sigma_lo_ground_truth:conditionQDPs:trial -0.16 0.11 1.00
## Bulk_ESS Tail_ESS
## Intercept 4081 6648
## sigma_Intercept 1821 3365
## lo_ground_truth 2678 4307
## meansTRUE 4072 6679
## sd_diff15 4534 6510
## conditionHOPs 5681 7897
## conditionintervals 4970 7053
## conditionQDPs 3347 6349
## trial 6496 8008
## lo_ground_truth:meansTRUE 4976 7467
## lo_ground_truth:sd_diff15 4477 7558
## meansTRUE:sd_diff15 4604 6923
## lo_ground_truth:conditionHOPs 3883 6402
## lo_ground_truth:conditionintervals 2624 5155
## lo_ground_truth:conditionQDPs 3097 5313
## meansTRUE:conditionHOPs 5004 7975
## meansTRUE:conditionintervals 4166 7118
## meansTRUE:conditionQDPs 3360 6887
## sd_diff15:conditionHOPs 5563 8427
## sd_diff15:conditionintervals 4837 7696
## sd_diff15:conditionQDPs 4933 7348
## lo_ground_truth:trial 5184 7714
## conditionHOPs:trial 7734 9209
## conditionintervals:trial 7053 8544
## conditionQDPs:trial 6489 8562
## lo_ground_truth:meansTRUE:sd_diff15 4471 7063
## lo_ground_truth:meansTRUE:conditionHOPs 6152 8179
## lo_ground_truth:meansTRUE:conditionintervals 5286 7658
## lo_ground_truth:meansTRUE:conditionQDPs 5111 7726
## lo_ground_truth:sd_diff15:conditionHOPs 5426 7541
## lo_ground_truth:sd_diff15:conditionintervals 5061 7618
## lo_ground_truth:sd_diff15:conditionQDPs 5074 7937
## meansTRUE:sd_diff15:conditionHOPs 5463 7689
## meansTRUE:sd_diff15:conditionintervals 4696 7370
## meansTRUE:sd_diff15:conditionQDPs 4774 6439
## lo_ground_truth:conditionHOPs:trial 5970 7927
## lo_ground_truth:conditionintervals:trial 5590 7231
## lo_ground_truth:conditionQDPs:trial 5606 8186
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 4880 7386
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 5119 7528
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 4703 7392
## sigma_lo_ground_truth 2779 4760
## sigma_conditionHOPs 1790 3398
## sigma_conditionintervals 1928 3892
## sigma_conditionQDPs 1965 3892
## sigma_trial 6200 7085
## sigma_lo_ground_truth:conditionHOPs 2656 5103
## sigma_lo_ground_truth:conditionintervals 2850 4402
## sigma_lo_ground_truth:conditionQDPs 3096 5551
## sigma_lo_ground_truth:trial 6865 8276
## sigma_conditionHOPs:trial 6447 7923
## sigma_conditionintervals:trial 6536 7854
## sigma_conditionQDPs:trial 5827 6938
## sigma_lo_ground_truth:conditionHOPs:trial 7954 8168
## sigma_lo_ground_truth:conditionintervals:trial 7102 8679
## sigma_lo_ground_truth:conditionQDPs:trial 7095 8381
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Model Comparison
Each time we add a random effect, the number of parameters multiplies, especially since the random effects in each submodel share a covariance matrix. We want to make sure these parameters are contributing to the predictive validity of the model more than they risk overfitting. We’ll evaluate this by using WAIC to compare models. Whichever model has the smallest value of WAIC is the one that has the best predictive validity for the fewest parameters.
waic(
m.m.llo,
m.m.llo.r_means.sd,
m.m.llo.r_means.sd.sigma_gt,
m.m.llo.r_means.sd.trial.sigma_gt,
m.m.llo.r_means.sd.trial.sigma_gt.trial)
## Output of model 'm.m.llo':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -13183.3 216.9
## p_waic 2454.7 70.8
## waic 26366.7 433.8
##
## 1125 (5.6%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.llo.r_means.sd':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -12047.5 216.5
## p_waic 3080.7 74.1
## waic 24095.0 432.9
##
## 1453 (7.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.llo.r_means.sd.sigma_gt':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -9697.7 214.1
## p_waic 2887.0 57.2
## waic 19395.5 428.3
##
## 1292 (6.5%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -9304.8 213.4
## p_waic 3099.9 57.3
## waic 18609.5 426.8
##
## 1480 (7.4%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt.trial':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7806.6 206.2
## p_waic 3419.6 48.1
## waic 15613.1 412.3
##
## 1726 (8.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Model comparisons:
## elpd_diff se_diff
## m.m.llo.r_means.sd.trial.sigma_gt.trial 0.0 0.0
## m.m.llo.r_means.sd.trial.sigma_gt -1498.2 65.4
## m.m.llo.r_means.sd.sigma_gt -1891.2 72.2
## m.m.llo.r_means.sd -4240.9 132.9
## m.m.llo -5376.8 133.1
The most complex model has the lowest WAIC value, so we’ll continue expanding on it.
Add Predictors for Block Order
Let’s add block order to our previous model, just to check if the effect of the mean on judgments depends on block order. We’ll model this as a fixed effects interaction between block order and the presence absence of means. This will be the maximal model under our stategy of model expansion.
We use the same priors as we did for the previous model. Now, let’s fit the model to our data.
# hierarchical LLO model
m.max <- brm(data = model_df, family = "gaussian",
formula = bf(lo_p_sup ~ (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition*start_means + lo_ground_truth*condition*trial,
sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial + means*start_means),
prior = c(prior(normal(1, 0.5), class = b),
prior(normal(1.3, 1), class = Intercept),
prior(normal(0, 0.15), class = sd, group = worker_id),
prior(normal(0, 0.3), class = b, dpar = sigma),
prior(normal(0, 0.15), class = sd, dpar = sigma),
prior(lkj(4), class = cor)),
iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
control = list(adapt_delta = 0.99, max_treedepth = 12),
file = "model-fits/llo_mdl-min-r_means_sd_trial_block_sigma_gt_trial_means_block-build_version")
summary(m.max)
## Family: gaussian
## Links: mu = identity; sigma = log
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition * start_means + lo_ground_truth * condition * trial
## sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial + means * start_means
## Data: model_df (Number of observations: 19924)
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
## total post-warmup samples = 10000
##
## Group-Level Effects:
## ~worker_id (Number of levels: 623)
## Estimate Est.Error l-95% CI
## sd(Intercept) 0.06 0.01 0.05
## sd(lo_ground_truth) 0.39 0.01 0.37
## sd(trial) 0.03 0.01 0.00
## sd(meansTRUE) 0.03 0.01 0.02
## sd(sd_diff15) 0.08 0.01 0.07
## sd(lo_ground_truth:trial) 0.24 0.01 0.21
## sd(meansTRUE:sd_diff15) 0.06 0.01 0.04
## sd(sigma_Intercept) 1.18 0.03 1.12
## sd(sigma_lo_ground_truth) 0.41 0.01 0.38
## sd(sigma_trial) 1.19 0.04 1.12
## cor(Intercept,lo_ground_truth) -0.47 0.09 -0.64
## cor(Intercept,trial) 0.20 0.23 -0.28
## cor(lo_ground_truth,trial) -0.25 0.23 -0.64
## cor(Intercept,meansTRUE) 0.04 0.18 -0.29
## cor(lo_ground_truth,meansTRUE) -0.60 0.13 -0.81
## cor(trial,meansTRUE) 0.21 0.25 -0.31
## cor(Intercept,sd_diff15) -0.02 0.11 -0.23
## cor(lo_ground_truth,sd_diff15) 0.03 0.09 -0.14
## cor(trial,sd_diff15) 0.02 0.21 -0.40
## cor(meansTRUE,sd_diff15) -0.00 0.16 -0.34
## cor(Intercept,lo_ground_truth:trial) -0.27 0.10 -0.45
## cor(lo_ground_truth,lo_ground_truth:trial) 0.40 0.06 0.28
## cor(trial,lo_ground_truth:trial) -0.36 0.23 -0.72
## cor(meansTRUE,lo_ground_truth:trial) -0.13 0.16 -0.43
## cor(sd_diff15,lo_ground_truth:trial) 0.06 0.09 -0.10
## cor(Intercept,meansTRUE:sd_diff15) -0.33 0.14 -0.58
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.23 0.13 -0.04
## cor(trial,meansTRUE:sd_diff15) 0.17 0.22 -0.28
## cor(meansTRUE,meansTRUE:sd_diff15) 0.03 0.18 -0.33
## cor(sd_diff15,meansTRUE:sd_diff15) -0.30 0.12 -0.51
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) -0.12 0.12 -0.36
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.71 0.02 -0.75
## cor(sigma_Intercept,sigma_trial) 0.10 0.04 0.02
## cor(sigma_lo_ground_truth,sigma_trial) -0.05 0.04 -0.14
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.07 1.00 3205 6294
## sd(lo_ground_truth) 0.42 1.00 3332 6907
## sd(trial) 0.06 1.00 1235 2547
## sd(meansTRUE) 0.05 1.00 1379 2192
## sd(sd_diff15) 0.09 1.00 4032 7833
## sd(lo_ground_truth:trial) 0.27 1.00 1677 5455
## sd(meansTRUE:sd_diff15) 0.07 1.00 3520 7180
## sd(sigma_Intercept) 1.24 1.00 2740 4467
## sd(sigma_lo_ground_truth) 0.43 1.00 3982 6582
## sd(sigma_trial) 1.27 1.00 5467 7404
## cor(Intercept,lo_ground_truth) -0.29 1.00 587 1241
## cor(Intercept,trial) 0.61 1.00 6065 7549
## cor(lo_ground_truth,trial) 0.26 1.00 4894 5531
## cor(Intercept,meansTRUE) 0.41 1.00 2402 5259
## cor(lo_ground_truth,meansTRUE) -0.30 1.00 2874 5664
## cor(trial,meansTRUE) 0.63 1.00 1796 3322
## cor(Intercept,sd_diff15) 0.19 1.00 2076 4506
## cor(lo_ground_truth,sd_diff15) 0.20 1.00 3829 7001
## cor(trial,sd_diff15) 0.44 1.00 345 816
## cor(meansTRUE,sd_diff15) 0.31 1.00 627 1393
## cor(Intercept,lo_ground_truth:trial) -0.07 1.00 1131 2593
## cor(lo_ground_truth,lo_ground_truth:trial) 0.52 1.00 6278 8237
## cor(trial,lo_ground_truth:trial) 0.18 1.01 286 427
## cor(meansTRUE,lo_ground_truth:trial) 0.18 1.00 544 1427
## cor(sd_diff15,lo_ground_truth:trial) 0.23 1.00 2636 5672
## cor(Intercept,meansTRUE:sd_diff15) -0.05 1.00 3043 6243
## cor(lo_ground_truth,meansTRUE:sd_diff15) 0.47 1.00 4341 8138
## cor(trial,meansTRUE:sd_diff15) 0.57 1.00 1029 2030
## cor(meansTRUE,meansTRUE:sd_diff15) 0.39 1.00 2070 4493
## cor(sd_diff15,meansTRUE:sd_diff15) -0.04 1.00 3507 7193
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15) 0.12 1.00 3241 6173
## cor(sigma_Intercept,sigma_lo_ground_truth) -0.67 1.00 3969 6400
## cor(sigma_Intercept,sigma_trial) 0.18 1.00 4842 6926
## cor(sigma_lo_ground_truth,sigma_trial) 0.03 1.00 4098 6587
##
## Population-Level Effects:
## Estimate
## Intercept -0.02
## sigma_Intercept -1.71
## lo_ground_truth 0.45
## meansTRUE -0.00
## sd_diff15 0.04
## conditionHOPs -0.09
## conditionintervals -0.01
## conditionQDPs 0.02
## start_meansTRUE 0.01
## trial -0.06
## lo_ground_truth:meansTRUE -0.05
## lo_ground_truth:sd_diff15 0.08
## meansTRUE:sd_diff15 0.02
## lo_ground_truth:conditionHOPs -0.01
## lo_ground_truth:conditionintervals -0.10
## lo_ground_truth:conditionQDPs 0.07
## meansTRUE:conditionHOPs 0.08
## meansTRUE:conditionintervals 0.01
## meansTRUE:conditionQDPs -0.02
## sd_diff15:conditionHOPs 0.03
## sd_diff15:conditionintervals 0.02
## sd_diff15:conditionQDPs -0.01
## lo_ground_truth:start_meansTRUE -0.14
## meansTRUE:start_meansTRUE -0.02
## sd_diff15:start_meansTRUE 0.00
## conditionHOPs:start_meansTRUE 0.08
## conditionintervals:start_meansTRUE 0.00
## conditionQDPs:start_meansTRUE -0.01
## lo_ground_truth:trial 0.13
## conditionHOPs:trial 0.01
## conditionintervals:trial 0.04
## conditionQDPs:trial 0.05
## lo_ground_truth:meansTRUE:sd_diff15 0.05
## lo_ground_truth:meansTRUE:conditionHOPs -0.08
## lo_ground_truth:meansTRUE:conditionintervals -0.01
## lo_ground_truth:meansTRUE:conditionQDPs -0.00
## lo_ground_truth:sd_diff15:conditionHOPs 0.06
## lo_ground_truth:sd_diff15:conditionintervals -0.01
## lo_ground_truth:sd_diff15:conditionQDPs 0.03
## meansTRUE:sd_diff15:conditionHOPs -0.00
## meansTRUE:sd_diff15:conditionintervals -0.02
## meansTRUE:sd_diff15:conditionQDPs 0.00
## lo_ground_truth:meansTRUE:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:start_meansTRUE 0.03
## meansTRUE:sd_diff15:start_meansTRUE -0.01
## lo_ground_truth:conditionHOPs:start_meansTRUE -0.07
## lo_ground_truth:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.14
## meansTRUE:conditionHOPs:start_meansTRUE -0.09
## meansTRUE:conditionintervals:start_meansTRUE 0.01
## meansTRUE:conditionQDPs:start_meansTRUE 0.02
## sd_diff15:conditionHOPs:start_meansTRUE -0.02
## sd_diff15:conditionintervals:start_meansTRUE -0.01
## sd_diff15:conditionQDPs:start_meansTRUE -0.02
## lo_ground_truth:conditionHOPs:trial -0.03
## lo_ground_truth:conditionintervals:trial 0.00
## lo_ground_truth:conditionQDPs:trial -0.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.04
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.12
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE -0.01
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.01
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE -0.02
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.04
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.02
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.01
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.07
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.00
## sigma_lo_ground_truth 0.45
## sigma_conditionHOPs 0.58
## sigma_conditionintervals 0.16
## sigma_conditionQDPs -0.05
## sigma_trial -0.45
## sigma_meansTRUE 0.00
## sigma_start_meansTRUE -0.04
## sigma_lo_ground_truth:conditionHOPs -0.17
## sigma_lo_ground_truth:conditionintervals -0.10
## sigma_lo_ground_truth:conditionQDPs -0.03
## sigma_lo_ground_truth:trial 0.02
## sigma_conditionHOPs:trial 0.06
## sigma_conditionintervals:trial 0.12
## sigma_conditionQDPs:trial -0.06
## sigma_meansTRUE:start_meansTRUE -0.23
## sigma_lo_ground_truth:conditionHOPs:trial 0.05
## sigma_lo_ground_truth:conditionintervals:trial 0.06
## sigma_lo_ground_truth:conditionQDPs:trial -0.02
## Est.Error
## Intercept 0.02
## sigma_Intercept 0.09
## lo_ground_truth 0.04
## meansTRUE 0.02
## sd_diff15 0.02
## conditionHOPs 0.03
## conditionintervals 0.02
## conditionQDPs 0.02
## start_meansTRUE 0.02
## trial 0.02
## lo_ground_truth:meansTRUE 0.02
## lo_ground_truth:sd_diff15 0.02
## meansTRUE:sd_diff15 0.02
## lo_ground_truth:conditionHOPs 0.07
## lo_ground_truth:conditionintervals 0.06
## lo_ground_truth:conditionQDPs 0.06
## meansTRUE:conditionHOPs 0.03
## meansTRUE:conditionintervals 0.02
## meansTRUE:conditionQDPs 0.03
## sd_diff15:conditionHOPs 0.04
## sd_diff15:conditionintervals 0.03
## sd_diff15:conditionQDPs 0.03
## lo_ground_truth:start_meansTRUE 0.06
## meansTRUE:start_meansTRUE 0.03
## sd_diff15:start_meansTRUE 0.03
## conditionHOPs:start_meansTRUE 0.04
## conditionintervals:start_meansTRUE 0.03
## conditionQDPs:start_meansTRUE 0.03
## lo_ground_truth:trial 0.03
## conditionHOPs:trial 0.04
## conditionintervals:trial 0.03
## conditionQDPs:trial 0.03
## lo_ground_truth:meansTRUE:sd_diff15 0.02
## lo_ground_truth:meansTRUE:conditionHOPs 0.04
## lo_ground_truth:meansTRUE:conditionintervals 0.03
## lo_ground_truth:meansTRUE:conditionQDPs 0.03
## lo_ground_truth:sd_diff15:conditionHOPs 0.03
## lo_ground_truth:sd_diff15:conditionintervals 0.02
## lo_ground_truth:sd_diff15:conditionQDPs 0.03
## meansTRUE:sd_diff15:conditionHOPs 0.04
## meansTRUE:sd_diff15:conditionintervals 0.03
## meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:start_meansTRUE 0.03
## lo_ground_truth:sd_diff15:start_meansTRUE 0.02
## meansTRUE:sd_diff15:start_meansTRUE 0.03
## lo_ground_truth:conditionHOPs:start_meansTRUE 0.09
## lo_ground_truth:conditionintervals:start_meansTRUE 0.09
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.09
## meansTRUE:conditionHOPs:start_meansTRUE 0.05
## meansTRUE:conditionintervals:start_meansTRUE 0.04
## meansTRUE:conditionQDPs:start_meansTRUE 0.04
## sd_diff15:conditionHOPs:start_meansTRUE 0.05
## sd_diff15:conditionintervals:start_meansTRUE 0.04
## sd_diff15:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:conditionHOPs:trial 0.05
## lo_ground_truth:conditionintervals:trial 0.04
## lo_ground_truth:conditionQDPs:trial 0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.05
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.04
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 0.03
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.04
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.04
## sigma_lo_ground_truth 0.03
## sigma_conditionHOPs 0.12
## sigma_conditionintervals 0.12
## sigma_conditionQDPs 0.12
## sigma_trial 0.10
## sigma_meansTRUE 0.03
## sigma_start_meansTRUE 0.07
## sigma_lo_ground_truth:conditionHOPs 0.05
## sigma_lo_ground_truth:conditionintervals 0.05
## sigma_lo_ground_truth:conditionQDPs 0.05
## sigma_lo_ground_truth:trial 0.05
## sigma_conditionHOPs:trial 0.14
## sigma_conditionintervals:trial 0.14
## sigma_conditionQDPs:trial 0.14
## sigma_meansTRUE:start_meansTRUE 0.05
## sigma_lo_ground_truth:conditionHOPs:trial 0.07
## sigma_lo_ground_truth:conditionintervals:trial 0.07
## sigma_lo_ground_truth:conditionQDPs:trial 0.07
## l-95% CI
## Intercept -0.05
## sigma_Intercept -1.89
## lo_ground_truth 0.36
## meansTRUE -0.04
## sd_diff15 0.00
## conditionHOPs -0.14
## conditionintervals -0.05
## conditionQDPs -0.02
## start_meansTRUE -0.03
## trial -0.10
## lo_ground_truth:meansTRUE -0.09
## lo_ground_truth:sd_diff15 0.04
## meansTRUE:sd_diff15 -0.03
## lo_ground_truth:conditionHOPs -0.14
## lo_ground_truth:conditionintervals -0.22
## lo_ground_truth:conditionQDPs -0.05
## meansTRUE:conditionHOPs 0.02
## meansTRUE:conditionintervals -0.04
## meansTRUE:conditionQDPs -0.07
## sd_diff15:conditionHOPs -0.04
## sd_diff15:conditionintervals -0.04
## sd_diff15:conditionQDPs -0.07
## lo_ground_truth:start_meansTRUE -0.26
## meansTRUE:start_meansTRUE -0.07
## sd_diff15:start_meansTRUE -0.05
## conditionHOPs:start_meansTRUE 0.01
## conditionintervals:start_meansTRUE -0.05
## conditionQDPs:start_meansTRUE -0.07
## lo_ground_truth:trial 0.06
## conditionHOPs:trial -0.06
## conditionintervals:trial -0.02
## conditionQDPs:trial -0.01
## lo_ground_truth:meansTRUE:sd_diff15 0.00
## lo_ground_truth:meansTRUE:conditionHOPs -0.15
## lo_ground_truth:meansTRUE:conditionintervals -0.06
## lo_ground_truth:meansTRUE:conditionQDPs -0.06
## lo_ground_truth:sd_diff15:conditionHOPs 0.00
## lo_ground_truth:sd_diff15:conditionintervals -0.05
## lo_ground_truth:sd_diff15:conditionQDPs -0.03
## meansTRUE:sd_diff15:conditionHOPs -0.09
## meansTRUE:sd_diff15:conditionintervals -0.08
## meansTRUE:sd_diff15:conditionQDPs -0.06
## lo_ground_truth:meansTRUE:start_meansTRUE -0.02
## lo_ground_truth:sd_diff15:start_meansTRUE -0.02
## meansTRUE:sd_diff15:start_meansTRUE -0.07
## lo_ground_truth:conditionHOPs:start_meansTRUE -0.25
## lo_ground_truth:conditionintervals:start_meansTRUE -0.14
## lo_ground_truth:conditionQDPs:start_meansTRUE -0.04
## meansTRUE:conditionHOPs:start_meansTRUE -0.19
## meansTRUE:conditionintervals:start_meansTRUE -0.06
## meansTRUE:conditionQDPs:start_meansTRUE -0.05
## sd_diff15:conditionHOPs:start_meansTRUE -0.11
## sd_diff15:conditionintervals:start_meansTRUE -0.08
## sd_diff15:conditionQDPs:start_meansTRUE -0.09
## lo_ground_truth:conditionHOPs:trial -0.13
## lo_ground_truth:conditionintervals:trial -0.09
## lo_ground_truth:conditionQDPs:trial -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs -0.10
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals -0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs -0.10
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE -0.02
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.02
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE -0.05
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE -0.09
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE -0.06
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE -0.05
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE -0.08
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.06
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.05
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.07
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE -0.16
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE -0.11
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE -0.08
## sigma_lo_ground_truth 0.39
## sigma_conditionHOPs 0.36
## sigma_conditionintervals -0.07
## sigma_conditionQDPs -0.29
## sigma_trial -0.65
## sigma_meansTRUE -0.06
## sigma_start_meansTRUE -0.18
## sigma_lo_ground_truth:conditionHOPs -0.27
## sigma_lo_ground_truth:conditionintervals -0.19
## sigma_lo_ground_truth:conditionQDPs -0.12
## sigma_lo_ground_truth:trial -0.08
## sigma_conditionHOPs:trial -0.22
## sigma_conditionintervals:trial -0.15
## sigma_conditionQDPs:trial -0.33
## sigma_meansTRUE:start_meansTRUE -0.32
## sigma_lo_ground_truth:conditionHOPs:trial -0.09
## sigma_lo_ground_truth:conditionintervals:trial -0.07
## sigma_lo_ground_truth:conditionQDPs:trial -0.15
## u-95% CI
## Intercept 0.01
## sigma_Intercept -1.53
## lo_ground_truth 0.54
## meansTRUE 0.03
## sd_diff15 0.08
## conditionHOPs -0.03
## conditionintervals 0.03
## conditionQDPs 0.06
## start_meansTRUE 0.06
## trial -0.01
## lo_ground_truth:meansTRUE -0.00
## lo_ground_truth:sd_diff15 0.11
## meansTRUE:sd_diff15 0.06
## lo_ground_truth:conditionHOPs 0.12
## lo_ground_truth:conditionintervals 0.03
## lo_ground_truth:conditionQDPs 0.19
## meansTRUE:conditionHOPs 0.15
## meansTRUE:conditionintervals 0.06
## meansTRUE:conditionQDPs 0.03
## sd_diff15:conditionHOPs 0.10
## sd_diff15:conditionintervals 0.07
## sd_diff15:conditionQDPs 0.05
## lo_ground_truth:start_meansTRUE -0.02
## meansTRUE:start_meansTRUE 0.04
## sd_diff15:start_meansTRUE 0.05
## conditionHOPs:start_meansTRUE 0.16
## conditionintervals:start_meansTRUE 0.06
## conditionQDPs:start_meansTRUE 0.04
## lo_ground_truth:trial 0.19
## conditionHOPs:trial 0.09
## conditionintervals:trial 0.10
## conditionQDPs:trial 0.11
## lo_ground_truth:meansTRUE:sd_diff15 0.10
## lo_ground_truth:meansTRUE:conditionHOPs -0.01
## lo_ground_truth:meansTRUE:conditionintervals 0.04
## lo_ground_truth:meansTRUE:conditionQDPs 0.05
## lo_ground_truth:sd_diff15:conditionHOPs 0.12
## lo_ground_truth:sd_diff15:conditionintervals 0.04
## lo_ground_truth:sd_diff15:conditionQDPs 0.08
## meansTRUE:sd_diff15:conditionHOPs 0.08
## meansTRUE:sd_diff15:conditionintervals 0.04
## meansTRUE:sd_diff15:conditionQDPs 0.07
## lo_ground_truth:meansTRUE:start_meansTRUE 0.10
## lo_ground_truth:sd_diff15:start_meansTRUE 0.07
## meansTRUE:sd_diff15:start_meansTRUE 0.04
## lo_ground_truth:conditionHOPs:start_meansTRUE 0.11
## lo_ground_truth:conditionintervals:start_meansTRUE 0.20
## lo_ground_truth:conditionQDPs:start_meansTRUE 0.31
## meansTRUE:conditionHOPs:start_meansTRUE 0.02
## meansTRUE:conditionintervals:start_meansTRUE 0.09
## meansTRUE:conditionQDPs:start_meansTRUE 0.10
## sd_diff15:conditionHOPs:start_meansTRUE 0.07
## sd_diff15:conditionintervals:start_meansTRUE 0.06
## sd_diff15:conditionQDPs:start_meansTRUE 0.05
## lo_ground_truth:conditionHOPs:trial 0.08
## lo_ground_truth:conditionintervals:trial 0.09
## lo_ground_truth:conditionQDPs:trial 0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 0.09
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 0.21
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 0.10
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 0.07
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 0.08
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 0.06
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 0.04
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.15
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.10
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 0.07
## sigma_lo_ground_truth 0.52
## sigma_conditionHOPs 0.81
## sigma_conditionintervals 0.40
## sigma_conditionQDPs 0.18
## sigma_trial -0.25
## sigma_meansTRUE 0.06
## sigma_start_meansTRUE 0.10
## sigma_lo_ground_truth:conditionHOPs -0.08
## sigma_lo_ground_truth:conditionintervals -0.01
## sigma_lo_ground_truth:conditionQDPs 0.07
## sigma_lo_ground_truth:trial 0.11
## sigma_conditionHOPs:trial 0.34
## sigma_conditionintervals:trial 0.39
## sigma_conditionQDPs:trial 0.22
## sigma_meansTRUE:start_meansTRUE -0.13
## sigma_lo_ground_truth:conditionHOPs:trial 0.18
## sigma_lo_ground_truth:conditionintervals:trial 0.20
## sigma_lo_ground_truth:conditionQDPs:trial 0.12
## Rhat
## Intercept 1.00
## sigma_Intercept 1.00
## lo_ground_truth 1.00
## meansTRUE 1.00
## sd_diff15 1.00
## conditionHOPs 1.00
## conditionintervals 1.00
## conditionQDPs 1.00
## start_meansTRUE 1.00
## trial 1.00
## lo_ground_truth:meansTRUE 1.00
## lo_ground_truth:sd_diff15 1.00
## meansTRUE:sd_diff15 1.00
## lo_ground_truth:conditionHOPs 1.00
## lo_ground_truth:conditionintervals 1.00
## lo_ground_truth:conditionQDPs 1.00
## meansTRUE:conditionHOPs 1.00
## meansTRUE:conditionintervals 1.00
## meansTRUE:conditionQDPs 1.00
## sd_diff15:conditionHOPs 1.00
## sd_diff15:conditionintervals 1.00
## sd_diff15:conditionQDPs 1.00
## lo_ground_truth:start_meansTRUE 1.00
## meansTRUE:start_meansTRUE 1.00
## sd_diff15:start_meansTRUE 1.00
## conditionHOPs:start_meansTRUE 1.00
## conditionintervals:start_meansTRUE 1.00
## conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:trial 1.00
## conditionHOPs:trial 1.00
## conditionintervals:trial 1.00
## conditionQDPs:trial 1.00
## lo_ground_truth:meansTRUE:sd_diff15 1.00
## lo_ground_truth:meansTRUE:conditionHOPs 1.00
## lo_ground_truth:meansTRUE:conditionintervals 1.00
## lo_ground_truth:meansTRUE:conditionQDPs 1.00
## lo_ground_truth:sd_diff15:conditionHOPs 1.00
## lo_ground_truth:sd_diff15:conditionintervals 1.00
## lo_ground_truth:sd_diff15:conditionQDPs 1.00
## meansTRUE:sd_diff15:conditionHOPs 1.00
## meansTRUE:sd_diff15:conditionintervals 1.00
## meansTRUE:sd_diff15:conditionQDPs 1.00
## lo_ground_truth:meansTRUE:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:start_meansTRUE 1.00
## meansTRUE:sd_diff15:start_meansTRUE 1.00
## lo_ground_truth:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:conditionQDPs:start_meansTRUE 1.00
## meansTRUE:conditionHOPs:start_meansTRUE 1.00
## meansTRUE:conditionintervals:start_meansTRUE 1.00
## meansTRUE:conditionQDPs:start_meansTRUE 1.00
## sd_diff15:conditionHOPs:start_meansTRUE 1.00
## sd_diff15:conditionintervals:start_meansTRUE 1.00
## sd_diff15:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:conditionHOPs:trial 1.00
## lo_ground_truth:conditionintervals:trial 1.00
## lo_ground_truth:conditionQDPs:trial 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 1.00
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 1.00
## sigma_lo_ground_truth 1.00
## sigma_conditionHOPs 1.00
## sigma_conditionintervals 1.00
## sigma_conditionQDPs 1.00
## sigma_trial 1.00
## sigma_meansTRUE 1.00
## sigma_start_meansTRUE 1.00
## sigma_lo_ground_truth:conditionHOPs 1.00
## sigma_lo_ground_truth:conditionintervals 1.00
## sigma_lo_ground_truth:conditionQDPs 1.00
## sigma_lo_ground_truth:trial 1.00
## sigma_conditionHOPs:trial 1.00
## sigma_conditionintervals:trial 1.00
## sigma_conditionQDPs:trial 1.00
## sigma_meansTRUE:start_meansTRUE 1.00
## sigma_lo_ground_truth:conditionHOPs:trial 1.00
## sigma_lo_ground_truth:conditionintervals:trial 1.00
## sigma_lo_ground_truth:conditionQDPs:trial 1.00
## Bulk_ESS
## Intercept 2580
## sigma_Intercept 1655
## lo_ground_truth 3504
## meansTRUE 2453
## sd_diff15 2639
## conditionHOPs 3514
## conditionintervals 2964
## conditionQDPs 2871
## start_meansTRUE 2462
## trial 3463
## lo_ground_truth:meansTRUE 2696
## lo_ground_truth:sd_diff15 2505
## meansTRUE:sd_diff15 2658
## lo_ground_truth:conditionHOPs 4181
## lo_ground_truth:conditionintervals 3571
## lo_ground_truth:conditionQDPs 3670
## meansTRUE:conditionHOPs 3465
## meansTRUE:conditionintervals 2593
## meansTRUE:conditionQDPs 2767
## sd_diff15:conditionHOPs 3785
## sd_diff15:conditionintervals 3151
## sd_diff15:conditionQDPs 3202
## lo_ground_truth:start_meansTRUE 3467
## meansTRUE:start_meansTRUE 2400
## sd_diff15:start_meansTRUE 2584
## conditionHOPs:start_meansTRUE 3524
## conditionintervals:start_meansTRUE 2934
## conditionQDPs:start_meansTRUE 2567
## lo_ground_truth:trial 4230
## conditionHOPs:trial 4865
## conditionintervals:trial 4120
## conditionQDPs:trial 3855
## lo_ground_truth:meansTRUE:sd_diff15 2540
## lo_ground_truth:meansTRUE:conditionHOPs 3416
## lo_ground_truth:meansTRUE:conditionintervals 2894
## lo_ground_truth:meansTRUE:conditionQDPs 2967
## lo_ground_truth:sd_diff15:conditionHOPs 3141
## lo_ground_truth:sd_diff15:conditionintervals 2823
## lo_ground_truth:sd_diff15:conditionQDPs 2842
## meansTRUE:sd_diff15:conditionHOPs 3533
## meansTRUE:sd_diff15:conditionintervals 3032
## meansTRUE:sd_diff15:conditionQDPs 3287
## lo_ground_truth:meansTRUE:start_meansTRUE 2635
## lo_ground_truth:sd_diff15:start_meansTRUE 2445
## meansTRUE:sd_diff15:start_meansTRUE 2625
## lo_ground_truth:conditionHOPs:start_meansTRUE 4090
## lo_ground_truth:conditionintervals:start_meansTRUE 3643
## lo_ground_truth:conditionQDPs:start_meansTRUE 3720
## meansTRUE:conditionHOPs:start_meansTRUE 3535
## meansTRUE:conditionintervals:start_meansTRUE 2619
## meansTRUE:conditionQDPs:start_meansTRUE 2646
## sd_diff15:conditionHOPs:start_meansTRUE 3806
## sd_diff15:conditionintervals:start_meansTRUE 3192
## sd_diff15:conditionQDPs:start_meansTRUE 3112
## lo_ground_truth:conditionHOPs:trial 5206
## lo_ground_truth:conditionintervals:trial 4917
## lo_ground_truth:conditionQDPs:trial 5060
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 3055
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 2877
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 2801
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 2424
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 3423
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 2877
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 2890
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 3444
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 2946
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 2849
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 3464
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 3157
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 3243
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 3116
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 2893
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 2780
## sigma_lo_ground_truth 2449
## sigma_conditionHOPs 1577
## sigma_conditionintervals 1646
## sigma_conditionQDPs 1924
## sigma_trial 5088
## sigma_meansTRUE 8196
## sigma_start_meansTRUE 2687
## sigma_lo_ground_truth:conditionHOPs 2485
## sigma_lo_ground_truth:conditionintervals 2546
## sigma_lo_ground_truth:conditionQDPs 2642
## sigma_lo_ground_truth:trial 7275
## sigma_conditionHOPs:trial 5451
## sigma_conditionintervals:trial 4927
## sigma_conditionQDPs:trial 5451
## sigma_meansTRUE:start_meansTRUE 8703
## sigma_lo_ground_truth:conditionHOPs:trial 7402
## sigma_lo_ground_truth:conditionintervals:trial 7054
## sigma_lo_ground_truth:conditionQDPs:trial 7836
## Tail_ESS
## Intercept 4762
## sigma_Intercept 3213
## lo_ground_truth 5086
## meansTRUE 4637
## sd_diff15 5257
## conditionHOPs 5745
## conditionintervals 5420
## conditionQDPs 4919
## start_meansTRUE 4345
## trial 5812
## lo_ground_truth:meansTRUE 5245
## lo_ground_truth:sd_diff15 5313
## meansTRUE:sd_diff15 5779
## lo_ground_truth:conditionHOPs 6754
## lo_ground_truth:conditionintervals 5562
## lo_ground_truth:conditionQDPs 6501
## meansTRUE:conditionHOPs 6165
## meansTRUE:conditionintervals 4680
## meansTRUE:conditionQDPs 5192
## sd_diff15:conditionHOPs 6343
## sd_diff15:conditionintervals 5559
## sd_diff15:conditionQDPs 6196
## lo_ground_truth:start_meansTRUE 5426
## meansTRUE:start_meansTRUE 4675
## sd_diff15:start_meansTRUE 5065
## conditionHOPs:start_meansTRUE 6332
## conditionintervals:start_meansTRUE 4873
## conditionQDPs:start_meansTRUE 4070
## lo_ground_truth:trial 6496
## conditionHOPs:trial 7227
## conditionintervals:trial 6041
## conditionQDPs:trial 6534
## lo_ground_truth:meansTRUE:sd_diff15 4524
## lo_ground_truth:meansTRUE:conditionHOPs 6114
## lo_ground_truth:meansTRUE:conditionintervals 5694
## lo_ground_truth:meansTRUE:conditionQDPs 5268
## lo_ground_truth:sd_diff15:conditionHOPs 6623
## lo_ground_truth:sd_diff15:conditionintervals 5344
## lo_ground_truth:sd_diff15:conditionQDPs 5599
## meansTRUE:sd_diff15:conditionHOPs 6233
## meansTRUE:sd_diff15:conditionintervals 5547
## meansTRUE:sd_diff15:conditionQDPs 5799
## lo_ground_truth:meansTRUE:start_meansTRUE 4775
## lo_ground_truth:sd_diff15:start_meansTRUE 5190
## meansTRUE:sd_diff15:start_meansTRUE 5272
## lo_ground_truth:conditionHOPs:start_meansTRUE 6134
## lo_ground_truth:conditionintervals:start_meansTRUE 6001
## lo_ground_truth:conditionQDPs:start_meansTRUE 6013
## meansTRUE:conditionHOPs:start_meansTRUE 5655
## meansTRUE:conditionintervals:start_meansTRUE 4972
## meansTRUE:conditionQDPs:start_meansTRUE 5144
## sd_diff15:conditionHOPs:start_meansTRUE 7108
## sd_diff15:conditionintervals:start_meansTRUE 5579
## sd_diff15:conditionQDPs:start_meansTRUE 5919
## lo_ground_truth:conditionHOPs:trial 7002
## lo_ground_truth:conditionintervals:trial 7467
## lo_ground_truth:conditionQDPs:trial 7159
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs 6430
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals 5071
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs 4493
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE 4851
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE 6432
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE 5980
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE 5374
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE 6616
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE 5255
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE 5729
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 6080
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 5534
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 5861
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE 6537
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 5046
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE 4737
## sigma_lo_ground_truth 4855
## sigma_conditionHOPs 3284
## sigma_conditionintervals 3342
## sigma_conditionQDPs 3827
## sigma_trial 7328
## sigma_meansTRUE 8609
## sigma_start_meansTRUE 4777
## sigma_lo_ground_truth:conditionHOPs 4456
## sigma_lo_ground_truth:conditionintervals 5287
## sigma_lo_ground_truth:conditionQDPs 4578
## sigma_lo_ground_truth:trial 8503
## sigma_conditionHOPs:trial 7735
## sigma_conditionintervals:trial 7564
## sigma_conditionQDPs:trial 7428
## sigma_meansTRUE:start_meansTRUE 8607
## sigma_lo_ground_truth:conditionHOPs:trial 8543
## sigma_lo_ground_truth:conditionintervals:trial 8990
## sigma_lo_ground_truth:conditionQDPs:trial 8508
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Model Comparison
Let’s see how this maximal model compares with our previous model.
waic(m.m.llo.r_means.sd.trial.sigma_gt.trial, m.max)
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt.trial':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7806.6 206.2
## p_waic 3419.6 48.1
## waic 15613.1 412.3
##
## 1726 (8.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Output of model 'm.max':
##
## Computed from 10000 by 19924 log-likelihood matrix
##
## Estimate SE
## elpd_waic -7761.2 206.2
## p_waic 3445.7 48.3
## waic 15522.4 412.3
##
## 1752 (8.8%) p_waic estimates greater than 0.4. We recommend trying loo instead.
##
## Model comparisons:
## elpd_diff se_diff
## m.max 0.0 0.0
## m.m.llo.r_means.sd.trial.sigma_gt.trial -45.4 12.3
It looks like adding predictors for block order improves fit somewhat, so we’ll run with the maximal version of the model that we managed to fit.
Predictive Checks
Let’s check our posterior predictive distribution.
# posterior predictive check
model_df %>%
select(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.max, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
mutate(
# transform to probability units
post_p_sup = plogis(lo_p_sup)
) %>%
ggplot(aes(x = post_p_sup)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior predictive distribution for probability of superiority") +
theme(panel.grid = element_blank())

How do these predictions compare to the observed data?
# data density
model_df %>%
ggplot(aes(x = p_superiority)) +
geom_density(fill = "black", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Data distribution for probability of superiority") +
theme(panel.grid = element_blank())

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.max, n = 500) %>%
ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

What does this look like in probability units?
model_check_df %>%
group_by(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
add_predicted_draws(m.max, n = 500) %>%
ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
geom_point(data = model_check_df) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ worker_id)

Order Effects
What does the posterior for the slope look like when means are present vs absent? We’ll split this based on uncertainty shown and block order (marginalizing across visualization conditions) to see if there is a difference in the effect of extrinsic means per block.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, start_means, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out visualization condition by taking a weighted average
ggplot(aes(x = slope, group = means, color = means, fill = means)) +
geom_density(alpha = 0.35) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes for mean present/absent") +
theme(panel.grid = element_blank()) +
facet_grid(start_means ~ sd_diff)

This effect suggests that adding means is most harmful at low uncertainty when users start with them, and adding means is helpful at high uncertainty in the second block of trials. This is a strange order effect, and it may be burying the signal for the
What does the posterior for the slope in each visualization condition look like, marginalizing across other predictors? Again, we’ll facet by block order to see if this has any impact on our results.
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, start_means, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes by visualization condition") +
theme(panel.grid = element_blank()) +
facet_grid(start_means ~ .)

It looks like LLO slopes are smaller (more biased) when users start the task with extrinsic means, except for with quantile dotplots.
What if we break these marginal effects down into simple effects for the interaction of the presence/absence of the mean, uncertainty shown, block order, and visualization condition?
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, start_means, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(subtitle = "Posterior for slopes for means * sd * block order * visualization condition") +
theme_minimal() +
facet_grid(start_means ~ sd_diff)

It looks like when participants start the task with extrinsic means, their LLO slopes become less biased when those means are removed, especially when uncertainty is low. Whereas when participants start the task without means, LLO slopes become less biased when means are added only for intervals and densities at high levels of uncertainty. For HOPs on the other hand, adding extrinsic means in the second block makes slopes more biased (despite the fact the users have more practice with HOPs by the second block).
Main Findings Adjusting for Order Effects
What is the effect of extrinsic means at high and low undertainty in our four visualization condition after adjusting for order effects?
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out other predictors by taking a weighted average
ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
stat_slabh(alpha = 0.35) +
labs(
title = "Posterior Slopes in Linear Log Odds Model",
x = "Slope",
y = "Visualization",
fill = "Means Present"
) +
theme_minimal() +
# theme(panel.grid.minor = element_blank()) +
facet_grid(. ~ sd_diff)

model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
ggplot(aes(x = slope, y = condition)) +
stat_slabh(alpha = 0.35) +
labs(
title = "Effect of Means on LLO Slopes",
x = "Slope Difference (Means present - absent)",
y = "Visualization"
) +
theme_minimal() +
# theme(panel.grid.minor = element_blank()) +
facet_grid(. ~ sd_diff)

model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(means, sd_diff, condition, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out other predictors by taking a weighted average
compare_levels(slope, by = means) %>% # contrast mean present - absent
compare_levels(slope, by = sd_diff) %>% # contrast sd_diff high - low (I think)
ggplot(aes(x = slope, y = condition)) +
stat_slabh(alpha = 0.35) +
labs(
title = "Posterior Slopes in Linear Log Odds Model",
x = "Slope Difference (Effect of means at high - low uncertainty)",
y = "Visualization"
) +
theme_minimal()

# theme(panel.grid.minor = element_blank())
It looks like extrinsic means lead to greater underestimation of probability of superiority (lower LLO slopes) when uncertainty is low, regardless of visualization condition. This is the effect we expected to see but which eluded us until we controlled for order effects. Surprisingly, the impact of extrinsic means does not seem to depend on the intinsic salience of the mean in the uncertainty visualization conditions. At high levels of uncertainty, extrinsic means improve slopes for intervals and densities but still reduce slopes for HOPs. These results suggest that adding extrinsic means is not a good design choice for HOPs or when the distributions visualized on a common axis differ in their variance.
What about the slopes in each visualization condition after adjusting for order effects?
model_df %>%
group_by(means, sd_diff, condition, trial, start_means) %>%
data_grid(lo_ground_truth = c(0, 1)) %>% # get fitted draws (in log odds units) only for ground truth of 0 and 1
add_fitted_draws(m.max, re_formula = NA) %>%
compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
rename(slope = .value) %>%
group_by(condition, .draw) %>% # group by predictors to keep
summarise(slope = weighted.mean(slope)) %>% # marginalize out means present/absent by taking a weighted average
ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
geom_density(alpha = 0.35) +
scale_fill_brewer(type = "qual", palette = 2) +
scale_color_brewer(type = "qual", palette = 2) +
scale_x_continuous(expression(slope), expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Posterior for slopes by visualization condition") +
theme(panel.grid = element_blank())
